Odd network issues on ZFS based NFS server

Jeremy Chadwick freebsd at jdc.parodius.com
Thu Jun 10 13:39:01 UTC 2010


On Thu, Jun 10, 2010 at 03:03:07PM +0200, Anders Nordby wrote:
> On Thu, Jun 10, 2010 at 04:48:32AM -0700, Jeremy Chadwick wrote:
> > Can you also provide "vmstat -i" output, both when the issue is
> > happening and after the machine has been rebooted (but been up for 5-10
> > minutes)?  Thanks.
> 
> While having issues:
> 
> root at unixfile:~# vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                           6          0
> irq14: ata0                            1          0
> irq18: uhci2                    78164874        953
> irq19: uhci1                      643047          7
> irq26: bge1                     73830825        900
> irq51: ciss0                      642774          7
> cpu0: timer                    163861455       1998
> cpu1: timer                    163853438       1998
> cpu3: timer                    163906515       1999
> cpu2: timer                    163906515       1999
> Total      
> 
> 5 minutes after a reboot:
> 
> root at unixfile:~# vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                           6          0
> irq14: ata0                            1          0
> irq18: uhci2                        5813         19
> irq19: uhci1                        2503          8
> irq26: bge1                         1997          6
> irq51: ciss0                        2503          8
> cpu0: timer                       592619       1995
> cpu1: timer                       584601       1968
> cpu2: timer                       584605       1968
> cpu3: timer                       584606       1968
> Total                            2359254       7943

The interrupt rate for bge1 (irq26) is very high during the problem,
while otherwise is only ~6/sec.  Shot in the dark, but this is probably
the cause of the packet loss you see.  Oddly, your uhci2 interface (used
for USB) is also firing at a very high rate.  I don't know if this is
the sign of a NIC problem, driver problem, or interrupt (think APIC?)
routing problem.

Debugging this is beyond my capability, but folks like John Baldwin may
have some ideas on where to go from here.

Also, have you used "netstat -ibn -I bge1" (to look at byte counters) or
"tcpdump -l -n -s 0 -i bge1" to watch network traffic live when this is
happening?  The reason I ask is to determine if there's any chance this
box starts seeing problems due to DoS attacks or excessive LAN traffic
which is unexpected.  Basically, be sure that all the network I/O going
on across bge1 is expected.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-fs mailing list