NFS: kernel modules (loading/unloading) and scheduling

Thu Feb 26 03:55:38 UTC 2015

Garrett Wollman wrote:
> In article
> <388835013.10159778.1424820357923.JavaMail.root at uoguelph.ca>,
> rmacklem at uoguelph.ca writes:
> 
> >I tend to think that a bias towards doing Getattr/Lookup over
> >Read/Write
> >may help performance (the old "shortest job first" principal), I'm
> >not
> >sure you'll have a big enough queue of outstanding RPCs under normal
> >load
> >for this to make a real difference.
> 
> I don't think this is a particularly relevant condition here.  There
> are lots of ways RPCs can pile up where you really need to do better
> work-sharing than the current implementation does.  One example is a
> client that issues lots of concurrent reads (e.g., a compute node
> running dozens of parallel jobs).  Two such systems on gigabit NICs
> can easily issue large reads fast enough to cause 64 nfsd service
> threads to blocked while waiting for the socket send buffer to drain.
> Meanwhile, the file server is completely idle, but unable to respond
> to incoming requests, and the other users get angry.  Rather than
> assigning new threads to requests from the slow clients, it would be
> better to let the requests sit until the send buffer drains, and
> process other clients' requests instead of letting the resources get
> monopolized by a single user.
> 
> Lest you think this is purely hypothetical: we actually experienced
> this problem today, and I verified with "procstat -kk" that all of
> the
> nfsd threads were in fact blocked waiting for send buffer space to
> open up.  I was able to restore service immediately by increasing the
> number of nfsd threads, but I'm unsure to what extent I can do this
> without breaking other things or hitting other bottlenecks.[1]  So I
> have a user asking me why I haven't enable fair-share scheduling for
> NFS, and I'm going to have to tell him the answer is "no such thing".
> 
> -GAWollman
> 
> [1] What would the right number actually be?  We could potentially
> have many thousands of threads in a compute cluster all operating
> simultaneously on the same filesystem, well within the I/O capacity
> of
> the server, and we'd really like to degrade gracefully rather than
> falling over when a single slow client soaks up all of the nfsd
> worker
> threads.
Well, each of these threads have two structures allocated to it.
1 - The kthread info (sched_sizeof_thread() <-- struct thread + the scheduler info one)
2 - A structure used by the krpc for each thread.
Since allocating two moderate sized structures isn't a lot of kernel
memory, I would think a server like yours would be fine with several
thousand nfsd threads.

What would be interesting would be the receive queue lengths for the
sockets for NFS client TCP connections when the server is running
normally. (This would be an indication of how many outstanding RPC
requests any scheduling effort would select between.)
I'll admit (given basic queuing theory) I would have expected these
receive queues to be small unless the server is overloaded.

Oh, and I now realize my response related to your first idea
"Admission" was way off and didn't make much sense. Somehow, I
thought receive queue when you were talking about send queue.
(Basically, just ignore my response.)
However, given the different sizes of RPC replies, it might
be hard to come up with a reasonable high water mark for the
send queue. Also, the networking code would have to do some
sort of upcall to the krpc when the send queue shrinks.
(So, still not trivial to implement, I think?)

I do agree with Alfred, in that I think you are experiencing
nfsd thread starvation and increasing the number of nfsd threads
a lot is the simple way to resolve this.

rick

> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe at freebsd.org"
>