test hash functions for fsid
Conrad Meyer
cem at freebsd.org
Thu May 9 22:01:26 UTC 2019
Hi Rick,
On Thu, May 9, 2019 at 2:44 PM Rick Macklem <rmacklem at uoguelph.ca> wrote:
> I do now think I can dynamically size it based on how many file systems
> (which is how many structures) will be linked off the table, so this shouldn't be
> a problem.
Great!
> >Load factor is just number of valid entries divided by space in the
> >table. So if your table has 256 spaces, load factor 0.8 would be
> >having about 205 valid entries.
>
> Hmm. That would suggest a large table size of about 100,000 for Peter's
> 72,000 file system case. It would result in a list length averaging about 1 entry,
Yeah, that's usually how hash tables are sized — aiming for a list
length of 1 (for chaining hash tables) or relatively short probe
sequence (for open addressing / embedded entry tables). It's what
provides the various "O(1)" properties hash tables ostensibly have.
> but I think a linear search through 10-20 entries won't take long enough to be
> a problem for this case. (This happens whenever mountd receives a SIGHUP and each time a client does a mount.)
Yes, it might not be noticeable in this application either way.
> As noted in the last post, I was thinking along the lines of 10->20 as a target
> linked list length. (Or "table_size = num / L", where L is the target average list length.
> L = 10->20 would be roughly a load average of 10->20.)
>
> Does that sound reasonable? (Interested in hearing from anyone on this.)
Wikipedia discusses it a little bit:
https://en.wikipedia.org/wiki/Hash_table#Key_statistics
Facebook recently open sourced their general-purpose C++ hashtable
implementations and talk at length about various tradeoffs, including
load: https://code.fb.com/developer-tools/f14/?r=1 It might be
interesting to read.
Anyway, usually it's a number less than 1. 10-20x is unconventional.
> Yes. It would be the distribution of values and not their randomness that would
> matter for this case. Hopefully someone with a large # of UFS file systems will
> run the test program and post the results. To be honest, I doubt if anyone will
> create a server with enough UFS file systems for it to be important to hash their
> fsid well. It works fine for the small number of UFS file systems I have.)
Maybe pho@ will take that as a challenge. You could create a lot of
UFS filesystems with mdconfig and enough RAM.
> Thanks for the comments, rick
Cheers,
Conrad
More information about the freebsd-fs
mailing list