How does NFS respond when a VFS operation gives ERESTART?
Rick Macklem
rmacklem at uoguelph.ca
Thu Jul 9 20:12:15 UTC 2015
Garrett Wollman wrote:
> When networked filesystems are not involved, the special error code
> [ERESTART] can be returned by the implementation of any system call,
> with the effect of causing the system call to be restarted when
> execution hits the kernel-user boundary, rather than returning to
> userland. This is used to allow certain system calls to be restarted
> after being interrupted by a signal. However, this normally only
> applies to system calls which might potentially sleep for a long time
> -- such as write() to a socket or a tty -- and not to disk I/O, which
> is normally uninterruptible.
>
> In investigating an issue reported by our users, it appears to me from
> an inspection of the code that ZFS can sometimes give an [ERESTART]
> condition, specifically when writing to a dataset that has reached its
> quota, AND there are pending block free operations that would reduce
> usage below the quota. But I don't see any code in the NFS (or kernel
> RPC) implementation that would actually handle this case, and of
> course the NFS server doesn't normally hit the user-kernel boundary at
> all. So does anyone have a theory about what actually happens in this
> case, and what *should* happen? It doesn't seem useful to just spin
> on the one operation over and over again until the blocks are freed
> (which I think might take a full ZFS transaction sync interval).
>
Well, I'll admit I'm not sure I really understand the situation, but...
My best guess would be have the NFS server reply NFSERR_DELAY to the client.
(NFSERR_DELAY doesn't exist for NFSv2, but I suspect you don't care about NFSv2?)
NFSERR_DELAY - Tells the client to wait a while (the RFCs don't define how long)
and then try the RPC again.
Does this sound like it would work?
If it sounds reasonable, I think patching the server to do this shouldn't be
too hard.
rick
> The actual symptom which I'm investigating is that sometimes --
> despite my fixes to the throttling code -- the server is still getting
> throttled, with thousands of requests enqueued for the same file.
> (The FHA code does a nice job of directing them all to the appropriate
> set of service threads, but that doesn't help the other clients get
> anything done because of the global throttle.) These seem not to make
> any progress for a long time, but the condition ultimately clears by
> itself -- what I'm trying to figure out is why so many requests get
> queued and don't make progress, and so far this seems to be related to
> hitting the quota on the filesystem. So [ERESTART] may be a total red
> herring, but it was something that stuck out at me when I was
> reviewing the code paths that could set [EDQUOT].
>
> -GAWollman
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>
More information about the freebsd-fs
mailing list