Odd network issues on ZFS based NFS server

Rick Macklem rmacklem at uoguelph.ca
Thu Jun 10 23:32:41 UTC 2010



On Thu, 10 Jun 2010, Jeremy Chadwick wrote:

>
> The interrupt rate for bge1 (irq26) is very high during the problem,
> while otherwise is only ~6/sec.  Shot in the dark, but this is probably
> the cause of the packet loss you see.  Oddly, your uhci2 interface (used
> for USB) is also firing at a very high rate.  I don't know if this is
> the sign of a NIC problem, driver problem, or interrupt (think APIC?)
> routing problem.
>
> Debugging this is beyond my capability, but folks like John Baldwin may
> have some ideas on where to go from here.
>
> Also, have you used "netstat -ibn -I bge1" (to look at byte counters) or
> "tcpdump -l -n -s 0 -i bge1" to watch network traffic live when this is
> happening?  The reason I ask is to determine if there's any chance this
> box starts seeing problems due to DoS attacks or excessive LAN traffic
> which is unexpected.  Basically, be sure that all the network I/O going
> on across bge1 is expected.
>
Yes, I think Jeremy is on the right track. I'd second the recommendation
to look at traffic when it is happening. I might choose:
 	tcpdump -s 0 -w <file> -i bge1
and then load "<file>" into wireshark, since wireshark is much better at
making sense of NFS traffic. (Since the nfsd is at the top of the process
list, it hints that there may be heavy nfs traffic being received by
bge1.)

If you do this tcpdump for a short period of time and then email "<file>"
to me as an attachment, I can take a look at it. (If the traffic isn't
NFS, then there's not much point in doing this.) We might have a case
where a client is retrying the same RPC (or RPC sequence) over and over
and over again, my friend (sorry I couldn't resist:-).

Given that you stated FreeBSD8.1-Prerelease I think you should have the
patch, but please make sure that your sys/nfsserver/nfs_srvsubs.c is
at least r206406.

Let me know how it goes, rick


More information about the freebsd-fs mailing list