test hash functions for fsid

Thu May 9 21:44:00 UTC 2019

Conrad Meyer wrote:
>On Wed, May 8, 2019 at 5:41 PM Rick Macklem <rmacklem at uoguelph.ca> wrote:
>[liberal snipping throughout :-)]
>>
>> I'll admit I had never heard of PCTRIE, but <sys/pctrie.h> seems to be
>> #ifdef _KERNEL, so it can't be used in userland?
>
>Ah, you're completely correct.  I had considered using PCTRIE for a
>userspace application earlier, and had started converting it to be
>buildable in userspace, but I never finished.  (I determined it
>wouldn't be a good fit for that application, unfortunately.)  I do
>think it might be a good datastructure to expose to base userspace
>programs eventually, but you probably don't really want to tackle that
>extra work :-).  A hash table is totally fine.
>
>> Yes. I just chose 256 for this test program. Unfortunately, the way mountd.c is
>> currently coded, the hash table must be sized before the getmntinfo() call,
>> so it must be sized before it knows how many file systems there are.
>> Since this is userland and table size isn't a memory issue, I'm just tempted to
>> make it large enough for a large server with something like 25,000 file systems.
>
>No objection, so long as people can still run tiny NFS servers on
>their Raspberry Pis. :-)
I do now think I can dynamically size it based on how many file systems
(which is how many structures) will be linked off the table, so this shouldn't be
a problem.

>> (I'd guess somewhere in the 256->1024 range would work. I'm not sure what
>>  you mean by load factor below 0.8?)
>
>Load factor is just number of valid entries divided by space in the
>table.  So if your table has 256 spaces, load factor 0.8 would be
>having about 205 valid entries.
Hmm. That would suggest a large table size of about 100,000 for Peter's
72,000 file system case. It would result in a list length averaging about 1 entry,
but I think a linear search through 10-20 entries won't take long enough to be
a problem for this case. (This happens whenever mountd receives a SIGHUP and each time a client does a mount.)

As noted in the last post, I was thinking along the lines of 10->20 as a target
linked list length. (Or "table_size = num / L", where L is the target average list length. 
L = 10->20 would be roughly a load average of 10->20.)

Does that sound reasonable? (Interested in hearing from anyone on this.)

>> Fortunately, neither ZFS nor UFS uses vfs_getnewfsid() unless there is a collision
>> and I don't really care about other file system types.
>
>Ah, sorry — I didn't read carefully enough when looking for fsid initialization.
>
>> After all, it just does
>>  a linear search down the list of N file systems right and just about anything
>>  should be an improvement.)
>
>Yes :-).
>
>> I added a simple (take the low order bits of val[0]) case to the test. I actually
>> suspect any of the hash functions will work well enough, since, as you note, most of the values (val[0] and 24bits of val[1]) are from a random # generator which should
>> be pretty uniform in distribution for ZFS.
>> UFS now uses a value from the superblock. It appears that newfs sets val[0] to the
>> creation time of the file system and val[1] to a random value.
>
>Hm, it looks like makefs sets val[1] (fs_id[1]) to a non-random value
>generated by the predictable PRNG random(3), for reproducible build
>reasons.  makefs seeds srandom(3) with either the current time in
>seconds (low entropy) or some known timestamp (either from a file,
>also in seconds, or an integer) (no entropy).  I guess random(3) may
>provide better distribution for the table than a plain global counter,
>though.
Yes. It would be the distribution of values and not their randomness that would
matter for this case. Hopefully someone with a large # of UFS file systems will
run the test program and post the results. To be honest, I doubt if anyone will
create a server with enough UFS file systems for it to be important to hash their
fsid well. It works fine for the small number of UFS file systems I have.)

>> If it had been the
>> reverse, I would be tempted to only use val[0], but I'll see how well the test goes
>> for Peter.
>
>Seems like you were right — any old function is good enough :-).
>From Peter's test, the first three did fine and almost the same. The last two weren't
as good, but were still tolerable.

Thanks for the comments, rick