Odd network issues on ZFS based NFS server

Jeremy Chadwick freebsd at jdc.parodius.com
Fri Jun 11 03:18:13 UTC 2010


On Thu, Jun 10, 2010 at 07:48:49PM -0400, Rick Macklem wrote:
> On Thu, 10 Jun 2010, Jeremy Chadwick wrote:
> >The interrupt rate for bge1 (irq26) is very high during the problem,
> >while otherwise is only ~6/sec.  Shot in the dark, but this is probably
> >the cause of the packet loss you see.  Oddly, your uhci2 interface (used
> >for USB) is also firing at a very high rate.  I don't know if this is
> >the sign of a NIC problem, driver problem, or interrupt (think APIC?)
> >routing problem.
> >
> >Debugging this is beyond my capability, but folks like John Baldwin may
> >have some ideas on where to go from here.
> >
> >Also, have you used "netstat -ibn -I bge1" (to look at byte counters) or
> >"tcpdump -l -n -s 0 -i bge1" to watch network traffic live when this is
> >happening?  The reason I ask is to determine if there's any chance this
> >box starts seeing problems due to DoS attacks or excessive LAN traffic
> >which is unexpected.  Basically, be sure that all the network I/O going
> >on across bge1 is expected.
> >
> Yes, I think Jeremy is on the right track. I'd second the recommendation
> to look at traffic when it is happening. I might choose:
> 	tcpdump -s 0 -w <file> -i bge1
> and then load "<file>" into wireshark, since wireshark is much better at
> making sense of NFS traffic. (Since the nfsd is at the top of the process
> list, it hints that there may be heavy nfs traffic being received by
> bge1.)
> 
> If you do this tcpdump for a short period of time and then email "<file>"
> to me as an attachment, I can take a look at it. (If the traffic isn't
> NFS, then there's not much point in doing this.) We might have a case
> where a client is retrying the same RPC (or RPC sequence) over and over
> and over again, my friend (sorry I couldn't resist:-).
> 
> Given that you stated FreeBSD8.1-Prerelease I think you should have the
> patch, but please make sure that your sys/nfsserver/nfs_srvsubs.c is
> at least r206406.
> 
> Let me know how it goes, rick

Also for Anders --

With regards to possible bge(4) issues, Yong-Hyeon works on this driver
fairly often.  If it turns out to be a driver issue of some sort, he can
probably help.  Relevant commits are here (to give you some idea of
activity):

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/bge/if_bge.c

One commit caught my eye (rev 1.226.2.15), but that seems to be more
focused on mbuf issues (your system doesn't appear to be having any,
given your netstat -m output).

CC'ing Yong-Hyeong, as he might know of some edge case where bge(4)
could go crazy with interrupts.  :-)  Yong-Hyeon, the entire thread is
here:

http://lists.freebsd.org/pipermail/freebsd-fs/2010-June/008654.html

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-fs mailing list