So many performance hits come from poor memory access patterns. Often "simpler" or "slower" algorithms have "better" performance for an application, because they allow for better memory access patterns and less cache invalidation.
Sometimes these sorts of things are localized and easy to fix. Sometimes the problems get "baked in" to an application's data structures and are hard to fix later.
High on the list of things to think about at the design stage is "how will X affect my cache usage?"
So many performance hits come from poor memory access patterns. Often "simpler" or "slower" algorithms have "better" performance for an application, because they allow for better memory access patterns and less cache invalidation.
Sometimes these sorts of things are localized and easy to fix. Sometimes the problems get "baked in" to an application's data structures and are hard to fix later.
High on the list of things to think about at the design stage is "how will X affect my cache usage?"