How does NFS respond when a VFS operation gives ERESTART?

Thu Jul 9 20:12:15 UTC 2015

Garrett Wollman wrote:
> When networked filesystems are not involved, the special error code
> [ERESTART] can be returned by the implementation of any system call,
> with the effect of causing the system call to be restarted when
> execution hits the kernel-user boundary, rather than returning to
> userland.  This is used to allow certain system calls to be restarted
> after being interrupted by a signal.  However, this normally only
> applies to system calls which might potentially sleep for a long time
> -- such as write() to a socket or a tty -- and not to disk I/O, which
> is normally uninterruptible.
> 
> In investigating an issue reported by our users, it appears to me from
> an inspection of the code that ZFS can sometimes give an [ERESTART]
> condition, specifically when writing to a dataset that has reached its
> quota, AND there are pending block free operations that would reduce
> usage below the quota.  But I don't see any code in the NFS (or kernel
> RPC) implementation that would actually handle this case, and of
> course the NFS server doesn't normally hit the user-kernel boundary at
> all.  So does anyone have a theory about what actually happens in this
> case, and what *should* happen?  It doesn't seem useful to just spin
> on the one operation over and over again until the blocks are freed
> (which I think might take a full ZFS transaction sync interval).
> 
Well, I'll admit I'm not sure I really understand the situation, but...
My best guess would be have the NFS server reply NFSERR_DELAY to the client.
(NFSERR_DELAY doesn't exist for NFSv2, but I suspect you don't care about NFSv2?)

NFSERR_DELAY - Tells the client to wait a while (the RFCs don't define how long)
               and then try the RPC again.
Does this sound like it would work?

If it sounds reasonable, I think patching the server to do this shouldn't be
too hard.

rick

> The actual symptom which I'm investigating is that sometimes --
> despite my fixes to the throttling code -- the server is still getting
> throttled, with thousands of requests enqueued for the same file.
> (The FHA code does a nice job of directing them all to the appropriate
> set of service threads, but that doesn't help the other clients get
> anything done because of the global throttle.)  These seem not to make
> any progress for a long time, but the condition ultimately clears by
> itself -- what I'm trying to figure out is why so many requests get
> queued and don't make progress, and so far this seems to be related to
> hitting the quota on the filesystem.  So [ERESTART] may be a total red
> herring, but it was something that stuck out at me when I was
> reviewing the code paths that could set [EDQUOT].
> 
> -GAWollman
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>