test hash functions for fsid

Thu May 9 00:41:33 UTC 2019

Conrad Meyer wrote:
>On Wed, May 8, 2019 at 7:39 AM Rick Macklem <rmacklem at uoguelph.ca> wrote:
>> If you have a FreeBSD system with lots of local file systems (1000s), maybe you could
>> run the attached program on it and email me the stats.
>> I'm just trying to see how well these hash functions work on fsids.
>
>Below I'll get to some comments on hash functions and hash tables, but
>first: have you considered a non-hash data structure, such as PCTRIE?
>PCTRIE_DEFINE() creates a datastructure with similar properties to a
>hash table for this use, but does not require a hash function step at
>all because all fsids are already unique, fixed-size, 64-bit values.
>It may be especially space-efficient for non-ZFS FSIDs, when the
>64-bit key is composed with '(fsid.val[1] << 32 | fsid.val[0])'.
>(Elaborated on more below.(A))
I'll admit I had never heard of PCTRIE, but <sys/pctrie.h> seems to be
#ifdef _KERNEL, so it can't be used in userland?

>Usually it makes sense to size hash tables for the size of the data;
>for direct insert tables you want to keep the load factor below 0.8 or
>something like that.  So the hardcoded 256 doesn't make a ton of
>sense, IMO.
Yes. I just chose 256 for this test program. Unfortunately, the way mountd.c is
currently coded, the hash table must be sized before the getmntinfo() call,
so it must be sized before it knows how many file systems there are.
Since this is userland and table size isn't a memory issue, I'm just tempted to
make it large enough for a large server with something like 25,000 file systems.
(I'd guess somewhere in the 256->1024 range would work. I'm not sure what
 you mean by load factor below 0.8?)

>As far as hash functions, both (hash32_buf == djb2, FNV32) are really
>weak, but importantly for hash tables, fast.
>https://github.com/Cyan4973/smhasher#summary provides some broader
>context, but keep in mind most of those algorithms are targeting much
>longer keys than 8 bytes.  I would guess that h1/h3 will be noticably
>worse than h2/h4 without performing any better, but I don't have data
>to back that up.
>
>(If you were to test calculate_crc32c() or XXH32/XXH64, included in
>sys/contrib/zstd, you might observe fewer collisions and better
>distribution.  But that isn't necessarily the most important factor
>for hash table performance in this use.)
>
>(A) FSIDs themselves have poor initial distribution and *very* few
>unique bits (i.e., for filesystems that use vfs_getnewfsid, the int32
>fsid val[1] will be identical across all filesystems of the same type,
>such as NFS mounts).  The remaining val[0] is unique due to an
>incrementing global counter.  You could pretty easily extract the
>vfs_getnewfsid() code out to userspace to simulate creating a bunch of
>fsid_t's and run some tests on that list.
Fortunately, neither ZFS nor UFS uses vfs_getnewfsid() unless there is a collision
and I don't really care about other file system types.
(To clarify for bde@ and others, this is for file systems exported by mountd,
 which uses a single linked list for all the exported file systems, keyed on the
 f_fsid returned by statfs/getmntinfo. Since almost all exported file systems are
 UFS of ZFS and I suspect only ZFS will have 1000s of file systems in an NFS server
 box anytime soon, I don't really care how well others work. After all, it just does
 a linear search down the list of N file systems right and just about anything
 should be an improvement.)

>It isn't a real world
>distribution but it would probably be pretty close, outside of ZFS.
>ZFS FSIDs have 8 bits of shared vfs type, but the remaining 56 bits
>appears to be generated by arc4rand(9), so ZFS FSIDs will have good
>distribution even without hashing.
I added a simple (take the low order bits of val[0]) case to the test. I actually
suspect any of the hash functions will work well enough, since, as you note, most of the values (val[0] and 24bits of val[1]) are from a random # generator which should
be pretty uniform in distribution for ZFS.
UFS now uses a value from the superblock. It appears that newfs sets val[0] to the
creation time of the file system and val[1] to a random value. If it had been the
reverse, I would be tempted to only use val[0], but I'll see how well the test goes
for Peter. (I didn't know, but it has to run as "root". When run as non-root, the
f_fsid field is returned as 0 for all file systems by getmntinfo().)

Thanks, rick