jumbograms (& em) & nfs a no go

Sat Nov 1 12:34:40 PST 2003

Michal Mertl wrote:
> On Fri, 31 Oct 2003, Terry Lambert wrote:
> > Michal Mertl wrote:
> > > I then left one computer at 4.9 and upgraded the other to 5.0. When I
> > > mount a partition from 5.0 machine I found out, that copying reliably
> > > works only from 5.0 to 4.9. The other way around I see messages 'em0:
> > > discard oversize frame (ether type 800 flags 3 len 67582 > max 6014)' on
> > > 5.0 and the copying stalls. On 4.9 machine I later see 'nfs server
> > > 10.0.0.2:/usr: not responding'. The interface is stuck for some time - can
> > > be revived by changing mtu back to 1500 and down/up sequence.
> >
> > Implies the sending host is not honoring the MTU restriction when
> > deciding whether or not to frag packets.
> 
> Can you suggest what to do to find out what's really happening? I thought
> nfsd network part was mostly userland thus the same as ftpd (or better
> netperf) and should work.

No.  Traditionally (except in Linux), nfsd is a userland thing
that calls a system call and never returns to user space.  It
exists in order to provide a process context for use by blocking
calls in the kernel, specifically for use by tsleep(), wakeup(),
and so on.  In more modern UNIX systems, it's a kernel thread,
and has no user space existance at all, or, on systems that will
permit NFS to be turned off, and don't have the ability/desire
to hang the kernel on/off state off the existance/nonexistance
of active exports, it's a stub that tells the kernel to run the
kernel thread(s).

The easiest way to find out what's happening is to grep the BSD
sources where the message is coming from, and then work back to
understand the code paths that permitted something that's 67582
bytes in size to get there in the first place.

Not looking at the absolutely newest sources, from memory, my
original comment was based on "it had to come from the driver
that way".  This may or may not be a valid assumption, but it's
at last a starting hypothesis with a high probability.

BTW: It should be impossible to send jumbograms > 9K in size,
since the jumbogram specification requires that to be the top
end limit.  However, it also requires that Tigon2 and Intel
Gigabit ethernet adapters be able to negotiate >1500 MTU's,
up to the jumbogra sie, between them, ad th Intel hardware
never cooperated when I was trying to get it to autonegotiate,
and I always ended up falling back to a 1500 MTU, unless I
forced the issue with ifconfig.

I think at this point, you are going to have to look at the
sources; IMO, it's a problem in some code that calls the
ether_output() function directly with too large a packet, and
since NFS doesn't manually implement TCP, that's not it.

Hmmm.  Is this maybe UDP?  If so, the easiest fix is "don't
use UDP"; FreeBSD's UDP fragment reassembly code sucks anyway,
and gives an excellent means of implementing a DOS attack on
the target system's available mbufs.

If it's UDP, and you insist on it working, you might want to
make sure that the packet goes through the UDP fragmentation
and NFS rsize/wsize limitation code.

-- Terry