hashinit versus phashinit

David Schultz das at FreeBSD.ORG
Tue May 6 07:02:19 UTC 2008


On Mon, May 05, 2008, Roman Divacky wrote:
> hi
> 
> when we want to use a hash table in kernel we call "hashinit" which
> initializes a hash table with power-of-2 size. There's also "phashinit"
> that creates hash table of size that is a prime number. This was
> added in 1995 by davidg@ but it is not used anywhere in the kernel.
> 
> phk@ commited rev. 1.30 of vfs_cache.c replacing phashinit with hashinit
> stating that it's better because it replaces a division with logical and.
> is this reason still valid today? (the commit was done almost 11 years ago)
> 
> is there still any reason why not use the phashinit instead of hashinit?
> I believe using prime-sized hash table might have positive performance
> impact...

There's a tradeoff.

The argument for using powers of 2 is that division takes many
times longer than a logical AND.

The argument for using primes is that if your hash function isn't
as good as you thought it was, or if the data has some regularity
you weren't expecting, you can get screwed a lot more easily with
a power of 2 hash table.  With a prime-sized hash table, you only
get screwed if lots of your data is congruent modulo the prime,
which is very rare.

Most general-purpose hash implementations I've used (e.g., GNU
libstdc++, Sun JDK, Microsoft .NET) use prime table sizes,
probably to make it less likely that programmers will shoot
themselves in the foot with pathological data or bad hash functions.


More information about the freebsd-hackers mailing list