Performance notes

From NuclearCat's homepage

Jump to: navigation, search

Cache usage recommendation for C++ developers:

  • Sequential memory access is almost 10 times faster than random access (No surprise for DB developers, I hope). In your programs, this means you should use arrays instead of linked lists. In your code, this means you should avoid indirect calls. C++ developers - avoid virtual functions and use templates instead.
  • Choose data structures as small as possible. There is a huge speed penalty for data elements that are larger than a cache line. If they are larger than a memory page, it gets much worse.
  • Pre-fetching data (reading it into the cache before it is actually use) can help hide much of the cost of main memory access. C developers can force pre-fetch by accessing the variable before it is actually used, or by calling an appropriate assembly instruction.
  • Use structures to keep together variables that are used together. This will help you get them on the same cache line.
  • L3 cache is used in high end servers. Large L3 cache, say with 32M, can give you tremendous speed ups, much better than what a faster CPU will give you. Which is why servers with large L3 cache are so expensive. Ulrich gave an example of having Oracle fit entirely into the L3 cache, which will make it 10 times faster and will justify the cost. I burst out laughing, because to me fitting the DB into 32M of memory is ridiculous - we need 12G for the SGA! But the rest of the audience took this seriously, so maybe I missed something.
  • Multi-threading. Generally, a very bad idea - because if a bit of memory is in L1 cache of CPU #1, CPU #2 can’t use it until #1 releases the bit, and then #2 reads it from main memory, while invalidating the cache for #1, because the bit was changed by #2. So, if you use multi-threading, your expansive L1 cache is wasted. This part reminded me of the way cache fusion works in RAC.

http://people.redhat.com/drepper/cpumemory.pdf

Personal tools