test hash functions for fsid

Bruce Evans brde at optusnet.com.au
Wed May 8 17:33:25 UTC 2019


On Wed, 8 May 2019, Conrad Meyer wrote:

> On Wed, May 8, 2019 at 7:39 AM Rick Macklem <rmacklem at uoguelph.ca> wrote:
>> If you have a FreeBSD system with lots of local file systems (1000s), maybe you could
>> run the attached program on it and email me the stats.
>> I'm just trying to see how well these hash functions work on fsids.
>
> Below I'll get to some comments on hash functions and hash tables, but
> first: have you considered a non-hash data structure, such as PCTRIE?
> PCTRIE_DEFINE() creates a datastructure with similar properties to a
> hash table for this use, but does not require a hash function step at
> all because all fsids are already unique, fixed-size, 64-bit values.
> It may be especially space-efficient for non-ZFS FSIDs, when the
> 64-bit key is composed with '(fsid.val[1] << 32 | fsid.val[0])'.
> (Elaborated on more below.(A))

fsids must be unique in 32 bits, especially for nfs uses, since they
are usually blindly truncated to 32 bits in nfs clients that have
32-bit dev_t to form st_dev, but st_dev must be unique.

The FreeBSD-5 client blindly truncates:

 	vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0];

The FreeBSD-11 client does the same for nfsv3, but for nfsv4 it does
hashing stuff to convert from a 128-bit (?) na_filesid to 32 bits.

va_fsid isn't really an fsid.  It is a dev_t in all versions of FreeBSD,
so it is too small to hold the 64-bit fsids created by vfs_getnewfsid()
in all versions of FreeBSD except recent versions where dev_t is also
64 bits -- it works accidentally for them.

> (A) FSIDs themselves have poor initial distribution and *very* few
> unique bits (i.e., for filesystems that use vfs_getnewfsid, the int32
> fsid val[1] will be identical across all filesystems of the same type,
> such as NFS mounts).  The remaining val[0] is unique due to an
> incrementing global counter.

val[1] is useless unless there are more than 256 fs types, since it is
already in the top 8 bits of the 24-bit minor number in fsid.val[0],
except on systems with 8-bit minor numbers and in buggy versions of
nfs that truncate the minor number to 8 bits (very old versions of oldnfs
and not so old versions of newnfs).

val[1] should never be used since it gets truncated away in many cases.

val[0] has 16 bits of uniqueness from the counter, 8 mostly wasted for
distinguishing the fs type, and another 8 bits even more mostly wasted
for distinguishing the device type (255 for all synthetic devices was
to stay away from major numbers 0-254 for physical devices, but devfs
made all major numbers for physical devices 0).

In some versions of FreeBSD, changing makedev() broke the uniqueness of
val[0] by shifting the major bits into obvlivion (so synthetic devices
were not distinguishable from physical devices, and since physical devices
uses a dense set of minor numbers, collisions were likely).

The difference between servers and clients' use of fsid is confusing, and
I probably got some of the above wrong by conflating them.  Clients should
generate fsids unique in 32 bits.  Using vfs_getnewfsid() and ignoring
val[1] almost does this.  This is needed to support applications using
compat syscalls with 32-bit dev_t in st_dev and other places.  Other file
systems should also avoid actually using 64-bit dev_t for the same reason.
Using devfs almost does this.  It might be possible to generated more than
2**24 ptys so that the minor numbers start using the top 32 bits, but this
is foot-shooting and hopefully disk devices are already allocated with
low minor numbers.  Devices which can back file systems should be in a
different namespace (e.g., major == 1) to avoid this problem.

I think device numbers from servers are only visible to clients for
special files.  Uniqueness then is not so important, but I found that
changing makedev() broke it by noticing that server device numbers
were truncated on clients.  If the server fsid_t or dev_t is larger
than the client fsid_t or dev_t, then there is no way that the client
can represent all server numbers.  Hashing can work for sparse numbers,
but is complicated.  Servers should also not actually use 64-bit dev_t.

Bruce


More information about the freebsd-fs mailing list