test hash functions for fsid

Rick Macklem rmacklem at uoguelph.ca
Thu May 9 21:44:00 UTC 2019

Conrad Meyer wrote:
>On Wed, May 8, 2019 at 5:41 PM Rick Macklem <rmacklem at uoguelph.ca> wrote:
>[liberal snipping throughout :-)]
>> I'll admit I had never heard of PCTRIE, but <sys/pctrie.h> seems to be
>> #ifdef _KERNEL, so it can't be used in userland?
>Ah, you're completely correct.  I had considered using PCTRIE for a
>userspace application earlier, and had started converting it to be
>buildable in userspace, but I never finished.  (I determined it
>wouldn't be a good fit for that application, unfortunately.)  I do
>think it might be a good datastructure to expose to base userspace
>programs eventually, but you probably don't really want to tackle that
>extra work :-).  A hash table is totally fine.
>> Yes. I just chose 256 for this test program. Unfortunately, the way mountd.c is
>> currently coded, the hash table must be sized before the getmntinfo() call,
>> so it must be sized before it knows how many file systems there are.
>> Since this is userland and table size isn't a memory issue, I'm just tempted to
>> make it large enough for a large server with something like 25,000 file systems.
>No objection, so long as people can still run tiny NFS servers on
>their Raspberry Pis. :-)
I do now think I can dynamically size it based on how many file systems
(which is how many structures) will be linked off the table, so this shouldn't be
a problem.

>> (I'd guess somewhere in the 256->1024 range would work. I'm not sure what
>>  you mean by load factor below 0.8?)
>Load factor is just number of valid entries divided by space in the
>table.  So if your table has 256 spaces, load factor 0.8 would be
>having about 205 valid entries.
Hmm. That would suggest a large table size of about 100,000 for Peter's
72,000 file system case. It would result in a list length averaging about 1 entry,
but I think a linear search through 10-20 entries won't take long enough to be
a problem for this case. (This happens whenever mountd receives a SIGHUP and each time a client does a mount.)

As noted in the last post, I was thinking along the lines of 10->20 as a target
linked list length. (Or "table_size = num / L", where L is the target average list length. 
L = 10->20 would be roughly a load average of 10->20.)

Does that sound reasonable? (Interested in hearing from anyone on this.)

>> Fortunately, neither ZFS nor UFS uses vfs_getnewfsid() unless there is a collision
>> and I don't really care about other file system types.
>Ah, sorry — I didn't read carefully enough when looking for fsid initialization.
>> After all, it just does
>>  a linear search down the list of N file systems right and just about anything
>>  should be an improvement.)
>Yes :-).
>> I added a simple (take the low order bits of val[0]) case to the test. I actually
>> suspect any of the hash functions will work well enough, since, as you note, most of the values (val[0] and 24bits of val[1]) are from a random # generator which should
>> be pretty uniform in distribution for ZFS.
>> UFS now uses a value from the superblock. It appears that newfs sets val[0] to the
>> creation time of the file system and val[1] to a random value.
>Hm, it looks like makefs sets val[1] (fs_id[1]) to a non-random value
>generated by the predictable PRNG random(3), for reproducible build
>reasons.  makefs seeds srandom(3) with either the current time in
>seconds (low entropy) or some known timestamp (either from a file,
>also in seconds, or an integer) (no entropy).  I guess random(3) may
>provide better distribution for the table than a plain global counter,
Yes. It would be the distribution of values and not their randomness that would
matter for this case. Hopefully someone with a large # of UFS file systems will
run the test program and post the results. To be honest, I doubt if anyone will
create a server with enough UFS file systems for it to be important to hash their
fsid well. It works fine for the small number of UFS file systems I have.)

>> If it had been the
>> reverse, I would be tempted to only use val[0], but I'll see how well the test goes
>> for Peter.
>Seems like you were right — any old function is good enough :-).
>From Peter's test, the first three did fine and almost the same. The last two weren't
as good, but were still tolerable.

Thanks for the comments, rick

More information about the freebsd-fs mailing list