ZFS/NFS hickups and some tools to monitor stuff...

Thu Mar 26 00:27:40 UTC 2020

Peter Eriksson wrote:
>The last couple of weeks I’ve been fighting with a severe case of NFS users >complaining about slow response times from our (5) FreeBSD 11.3-RELEASE-p6 file >servers. Now even though our SMB (Windows) users (thankfully since they are like >500 per server vs 50 NFS users) didn’t see the same slowdown (or atleast didn’t >complain about it) the root cause is probably ZFS-related.
>
>We’ve identified a number of cases where some ZFS operation can cause severe >slowdown of NFS operations, and I’ve been trying to figure our what is the cause and >ways to mitigate the problem…
>
>Some operations that have caused issues:
>
>1. Resilver (basically made NFS service useless during the week it took…) with >response time for NFS operations regularity up to 10 seconds or more (vs the normal >1-10ms)
>
>2. Snapshot recursive deferred destruction (“zfs destroy -dr DATA at snapnam”). >Especially bad together with filesystems at or near quota.
>
>3. Rsync cloning of data into the servers. Response times up to 15 minutes was seen… >Yes, 15 minutes to do a mkdir(“test-dir”). Possibly in conjunction with #1 above….
>
>Previously #1 and #2 hasn’t caused that much problems, and #3 definitely. >Something has changed the last half year or so but so far I haven’t been able to >figure it out.
>
[stuff snipped]
>It would be interresting to see if others too are seeing ZFS and/or NFS slowdowns >during heavy writing operations (resilver, snapshot-destroy, rsync)…
>
>
>Our DATA pools are basically 2xRAIDZ2(4+2) of 10TB 7200rpm disks + 400GB SSD:s >for ZIL + 400GB SSDs for L2ARC. 256GB RAM, configured with ARC-MAX set to 64GB >(used to be 128GB but we ran into out-of-memory with the 500+ Samba smbd >daemons that would compete for the RAM…)
Since no one else has commented, I'll mention a few things.
First the disclaimer...I never use ZFS and know nothing about SSDs, so a lot of
what I'll be saying comes from discussions I've seen by others.

Now, I see you use a mirrored pair of SSDs for ZIL logging devices.
You don't mention what NFS client(s) are mounting the server, so I'm going
to assume they are Linux systems.
- I don't know how the client decides, but I have seen NFS Linux packet traces
  where the client does a lot of 4K writes with FILE_STABLE. FILE_STABLE means
  that the data and metadata related to the write must be on stable storage
  before the RPC replies NFS_OK.
  --> This means the data and metadata changes must be written to the ZIL.
As such, really slow response when a ZIL log device is being resilvered isn't
surprising to me.
For the other cases, there is a heavy write load, which "might" also be hitting
the ZIL log hard.

What can you do about this?
- You can live dangerously and set "sync=disabled" for ZFS. This means that
   the writes will reply NFS_OK without needing to write to the ZIL log first.
   (I don't know enough about ZFS to know whether or not this makes the ZIL
    log no longer get used?)
  - Why do I say "live dangerously"? Because data writes could get lost when
    the NFS server reboots and the NFS client would think the data was written
    just fine.

I'm the last guy to discuss SSDs, but they definitely have weird performance
for writing and can get very slow for writing, especially when they get nearly
full.
--> I have heard others recommend limiting the size of your ZIL to at most
      1/2 of the SSD's capacity, assuming the SSD is dedicated to the ZIL
      and nothing else. (I have no idea if you already do this?)

Hopefully others will have further comments, rick

We’ve tried it with and without L2ARC, and replaced the SSD:s. Disabled TRIM. Not much difference. Tried trimming various sysctls but no difference seen so far. Annoying problem this…

- Peter

_______________________________________________
freebsd-fs at freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"