bad NFS/UDP performance

Fri Oct 3 09:17:46 UTC 2008

> 
> On Fri, 3 Oct 2008, Danny Braniss wrote:
> 
> >> OK, so it looks like this was almost certainly the rwlock change.  What 
> >> happens if you pretty much universally substitute the following in 
> >> udp_usrreq.c:
> >>
> >> Currently		Change to
> >> ---------		---------
> >> INP_RLOCK		INP_WLOCK
> >> INP_RUNLOCK		INP_WUNLOCK
> >> INP_RLOCK_ASSERT	INP_WLOCK_ASSERT
> >
> > I guess you were almost certainly correct :-) I did the global subst. on the 
> > udp_usrreq.c from 19/08, __FBSDID("$FreeBSD: src/sys/netinet/udp_usrreq.c,v 
> > 1.218.2.3 2008/08/18 23:00:41 bz Exp $"); and now udp is fine again!
> 
> OK.  This is a change I'd rather not back out since it significantly improves 
> performance for many other UDP workloads, so we need to figure out why it's 
> hurting us so much here so that we know if there are reasonable alternatives.
> 
> Would it be possible for you to do a run of the workload with both kernels 
> using LOCK_PROFILING around the benchmark, and then we can compare lock 
> contention in the two cases?  What we often find is that relieving contention 
> at one point causes new contention at another point, and if the primitive used 
> at that point handles contention less well for whatever reason, performance 
> can be reduced rather than improved.  So maybe we're looking at an issue in 
> the dispatched UDP code from so_upcall?  Another less satisfying (and 
> fundamentally more difficult) answer might be "something to do with the 
> scheduler", but a bit more analysis may shed some light.

gladly, but have no idea how to do LOCK_PROFILING, so some pointers would be
helpfull.

as a side note, many years ago I checked out NFS/TCP and it was really bad,
I even remember NetApp telling us to drop TCP, but now, things look rather
better. Wonder what caused it.

danny