Examining the VM splay tree effectiveness
dillon at apollo.backplane.com
Fri Oct 1 05:04:14 UTC 2010
I don't remember the reference but I read a comprehensive comparison
between various indexing methods about a year ago and the splay tree
did considerably better than a RB-tree. The RB-tree actually did
Any binary tree-like structure makes fairly poor use of cpu caches.
Splay trees work somewhat better as long as there is some locality
of reference in the lookups, since the node being looked up is moved
to the root. It isn't a bad trade-off.
On the negative side all binary-tree-like structures tend to be
difficult to MP-lock in a fine-grained manner (not that very many
structures are locked at that fine a grain anyway, but it's a data
point). Splay-trees are impossible to lock at a fine-grain due to
the massive modifications made on any search and the movement
of the root, and there are MP access issues too.
What turned out to be the best indexing mechanism was a chained
hash table whos hoppers were small linear arrays instead of single
elements. So instead of pointer-chaining each element you have a small
for() loop for 4-8 elements before you chain. The structure being
indexed would NOT be integrated into the index directly, the index
would point at the final structure from the hopper.
For our purposes such linear arrays would contain a pointer and
an indexing value in as small an element as possible (8-16 bytes),
the idea being that you make the absolute best use of your cache line
and L1 cache / memory burst. One random access (initial hash index),
then linear accesses using a small indexing element, then one final
random access to get to the result structure and validate that
it's the one desired (at that point we'd be 99.9% sure that we have
the right structure because we have already compared the index value
stored in the hopper). As a plus the initial hash index also makes
MP locking the base of the chains easier.
I don't use arrayized chained hash tables in DFly either, but only
because it's stayed fairly low on the priority list. cpu isn't really
a major issue on systems these days. I/O is the bigger problem.
RB-Trees are simply extremely convenient from the programming side,
which is why we still use them.
I was surprised that splay trees did so well vs RB-trees, I never liked
the memory writes splay trees do let alone the impossibility of
fine-grained locking. Originally I thought the writes would make
performance worse, but they actually don't. Still, if I were to change
any topologies now I would definitely go with the chained-hash /
small-linear-array / chain / small-linear-array / chain mechanic. It
seems to be the clear winner.
<dillon at backplane.com>
More information about the freebsd-current