Suggestion for hardware for ZFS fileserver

Rick Macklem rmacklem at uoguelph.ca
Fri Dec 28 00:20:11 UTC 2018


I wrote:
>Peter Eriksson wrote:
>[good stuff snipped]
>>This has caused some interesting problems…
>>
>>First thing we noticed was that booting would take forever… Mounting the 20-100k >>filesystems _and_ enabling them to be shared via NFS is not done efficient at all (for each filesystem it re-reads /etc/zfs/exports (a couple of times) befor appending one line to the end. Repeat 20-100,000 times… Not to mention the big kernel lock for NFS “hold all NFS activity while we flush and reinstalls all sharing information per filesystem” being done by mountd…
>Yes, /etc/exports and mountd were implemented in the 1980s, when a dozen
>file systems would have been a large server. Scaling to 10,000 or more file
systems wasn't even conceivable back then.

>Wish list item #1: A BerkeleyDB-based ’sharetab’ that replaces the horribly >slow /etc/zfs/exports text file.
>Wish list item #2: A reimplementation of mountd and the kernel interface to allow >a “diff” between the contents of the DB-based sharetab above be input into the >kernel instead of the brute-force way it’s done now..
>The parser in mountd for /etc/exports is already an ugly beast and I think
>implementing a "diff" version will be difficult, especially figuring out what needs
>to be deleted.
>
>I do have a couple of questions related to this:
>1 - Would your case work if there was an "add these lines to /etc/exports"?
>     (Basically adding entries for file systems, but not trying to delete anything
>      previously exported. I am not a ZFS guy, but I think ZFS just generates another
>      exports file and then gets mountd to export everything again.)
>2 - Are all (or maybe most) of these ZFS file systems exported with the same
>      arguments?
>      - Here I am thinking that a "default-for-all-ZFS-filesystems" line could be
>         put in /etc/exports that would apply to all ZFS file systems not exported
>         by explicit lines in the exports file(s).
>      This would be fairly easy to implement and would avoid trying to handle
>      1000s of entries.
>
>In particular, #2 above could be easily implemented on top of what is already
>there, using a new type of line in /etc/exports and handling that as a special
>case by the NFS server code, when no specific export for the file system to the
>client is found.
Unfortunately, it doesn't sound like #2 above would be useful for Peter. Although it is
easy to implement a single default export for all ZFS file systems not already exported,
it would not be easy to say "export all file systems below /foo/bar this way", since
the kernel code basically doesn't know the directory structure. It has vnodes for
file objects and mount points to work with. (The kernel exports hang off of the
mount points.)
>>(I’ve written some code that implements item #1 above and it helps quite a bit. >>Nothing near production quality yet though. I have looked at item #2 a bit too but >>not done anything about it.)
Btw, this "item #2" is not what I am referring to.
[more good stuff snipped]

rick



More information about the freebsd-fs mailing list