re0 device txcsum issue
sean at mcneil.com
Thu Sep 23 16:32:42 PDT 2004
On Thu, 2004-09-23 at 16:21, John-Mark Gurney wrote:
> Sean McNeil wrote this message on Thu, Sep 23, 2004 at 16:08 -0700:
> > On Thu, 2004-09-23 at 15:52, John-Mark Gurney wrote:
> > > Sean McNeil wrote this message on Thu, Sep 23, 2004 at 15:20 -0700:
> > > > Is anyone willing to work with me to help trace down this problem? It
> > > > has been outstanding for a long time and I would dearly like it fixed.
> > > > I'm willing to help in any capacity to trace down the culprit.
> > > >
> > > > To recap, on the re0 device (possibly others) running -current on an
> > > > amd64 processor there are times when udp packets get improper checksum
> > > > calculations with txcsum set for the interface. This causes a deadlock
> > > > in nfs as the client just contunuously requests this packet and it is
> > > > rejected because of the checksum being bad.
> > >
> > > I have recently been working with the re driver, so I'm somewhat familar
> > > with the driver. I haven't seen any issues, but I also don't have an
> > > AMD64 system to test with.
> > >
> > > Have you tried to find out if it is packet size related? are you
> > > trying to use jumbo frames? rwatson committed netsend to the src/tools
> > > tree that could help this, and I have attached udpcheck.py which is
> > > a client/server script to test/verify packet sizes of difference
> > > sizes.
> > My method of testing has been to just do an "ls -lR" from a large
> > directory structure. With txcsum set, it consistently locks up. This
> > is on clients ranging from x86 linux, BSD, a sparc solaris2, and an hppa
> > HPUX machine. If I turn off txcsum (i.e. ifconfig re0 -txcsum) I have
> > never had any problems.
> > I tried your program with txcsum and it just hangs. Without txcsum,
> You can provide a -v to get more detailed information on what is going
> Yes, the naming (in the python script) is a bit confusing since it is
> from the server's poing of view.. I'm assuming you run the server side
> (-s 1234) on another box and the client (server 1234) on the box w/ the
> re driver? so, the errors below means that the client sent a udp packet
> and it didn't match...
> > I'm not sure what the output here means, but this is what I got:
> w/o txcsum you got the following errors? This is very worrysome as
> it means that you're getting udp packet corruption even w/o checksuming..
> Ok, what you can do is add the line:
> open('r.%d' % i, 'w').write(rdata)
> just before the line:
> print 'packet send mismatch at:', i, 'got:', rlen
> and then run the client as:
> python udpcheck.py server 1234 -s 1792 -e 1795
> You will then have a set of r.179[2-5] files... They are the contents
> of the packet that was received by the server... if you could email
> them to me in private mail, it might shed light on the problem..
I initially ran the server (-s 1234) on the re0 side. I've now added
the extra line and ran it in both directions. Sending you the output in
a private email....
More information about the freebsd-current