Suggestion for hardware for ZFS fileserver

Rick Macklem rmacklem at uoguelph.ca
Fri Dec 21 23:50:03 UTC 2018


Peter Eriksson wrote:
[good stuff snipped]
>This has caused some interesting problems…
>
>First thing we noticed was that booting would take forever… Mounting the 20-100k >filesystems _and_ enabling them to be shared via NFS is not done efficient at all (for >each filesystem it re-reads /etc/zfs/exports (a couple of times) befor appending one >line to the end. Repeat 20-100,000 times… Not to mention the big kernel lock for >NFS “hold all NFS activity while we flush and reinstalls all sharing information per >filesystem” being done by mountd…
Yes, /etc/exports and mountd were implemented in the 1980s, when a dozen
file systems would have been a large server. Scaling to 10,000 or more file
systems wasn't even conceivable back then.

>Wish list item #1: A BerkeleyDB-based ’sharetab’ that replaces the horribly >slow /etc/zfs/exports text file.
>Wish list item #2: A reimplementation of mountd and the kernel interface to allow >a “diff” between the contents of the DB-based sharetab above be input into the >kernel instead of the brute-force way it’s done now..
The parser in mountd for /etc/exports is already an ugly beast and I think
implementing a "diff" version will be difficult, especially figuring out what needs
to be deleted.

I do have a couple of questions related to this:
1 - Would your case work if there was an "add these lines to /etc/exports"?
     (Basically adding entries for file systems, but not trying to delete anything
      previously exported. I am not a ZFS guy, but I think ZFS just generates another
      exports file and then gets mountd to export everything again.)
2 - Are all (or maybe most) of these ZFS file systems exported with the same
      arguments?
      - Here I am thinking that a "default-for-all-ZFS-filesystems" line could be
         put in /etc/exports that would apply to all ZFS file systems not exported
         by explicit lines in the exports file(s).
      This would be fairly easy to implement and would avoid trying to handle
      1000s of entries.

In particular, #2 above could be easily implemented on top of what is already
there, using a new type of line in /etc/exports and handling that as a special
case by the NFS server code, when no specific export for the file system to the
client is found.

>(I’ve written some code that implements item #1 above and it helps quite a bit. >Nothing near production quality yet though. I have looked at item #2 a bit too but >not done anything about it.)
[more good stuff snipped]
Btw, although I put the questions here, I think a separate thread discussing
how to scale to 10000+ file systems might be useful. (On freebsd-fs@ or
freebsd-current at . The latter sometimes gets the attention of more developers.)

rick



More information about the freebsd-fs mailing list