CARP and em0 timeout watchdog

Sven Willenberger sven at dmv.com
Fri Apr 27 13:22:35 UTC 2007


On Fri, 2007-04-20 at 14:44 -0400, Sven Willenberger wrote:
> On Fri, 2007-04-20 at 11:27 -0700, Jack Vogel wrote:
> > On 4/20/07, Sven Willenberger <sven at dmv.com> wrote:
> > > On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote:
> > > > On 4/20/07, Jeremy Chadwick <koitsu at freebsd.org> wrote:
> > > > > On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote:
> > > > > > Having done more diagnostics I have found out it is not CARP related at
> > > > > > all. It turns out that the same timeouts will happen when ftp'ing to the
> > > > > > physical address IPs as well. There is also an odd situation here
> > > > > > depending on which protocol I use. The two boxes are connected to a Dell
> > > > > > Powerconnect 2616 gig switch with CAT6. If I scp files from the
> > > > > > 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a
> > > > > > hiccup (I used dd to create various sized testfiles from 32M to 1G in
> > > > > > size and just scp testfile* to the other box). On the other hand, if I
> > > > > > connect to 192.168.0.19 using ftp (either active or passive) where ftp
> > > > > > is being run through inetd, the interface resets (watchdog) within
> > > > > > seconds (a few MBs) of traffic. Enabling polling does nothing, nor does
> > > > > > changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing
> > > > > > such behavioral differences between scp and ftp?
> > > > >
> > > > > You'll get a much higher throughput rate with FTP than you will with
> > > > > SSH, simply because encryption overhead is quite high (even with the
> > > > > Blowfish cipher).  With a very fast processor and on a gigE network
> > > > > you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP.
> > > > > That's the only difference I can think of.
> > > > >
> > > > > The watchdog resets I can't explain; Jack Vogel should be able to assist
> > > > > with that.  But it sounds like the resets only happen under very high
> > > > > throughput conditions (which is why you'd see it with FTP but not SSH).
> > > >
> > > > What kind of hardware is this interface? Watchdogs mean TX cleanup
> > > > isn't happening in a reasonable time, without further data its hard to
> > > > know what might be going on.
> > > >
> > > > Jack
> > >
> > > from pciconf:
> > >
> > > em0 at pci13:0:0:  class=0x020000 card=0x108c15d9 chip=0x108c8086 rev=0x03
> > > hdr=0x00
> > >     vendor   = 'Intel Corporation'
> > >     device   = 'PRO/1000 PM'
> > >     class    = network
> > >     subclass = ethernet
> > > em1 at pci14:0:0:  class=0x020000 card=0x109a15d9 chip=0x109a8086 rev=0x00
> > > hdr=0x00
> > >     vendor   = 'Intel Corporation'
> > >     class    = network
> > >     subclass = ethernet
> > >
> > > em0 is the interface in question.
> > >
> > > from dmesg:
> > >
> > > em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
> > > 0x4000-0x401f mem 0xe0300000-0xe031ffff irq 16 at device 0.0 on pci13
> > >
> > > em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
> > > 0x5000-0x501f mem 0xe0400000-0xe041ffff irq 17 at device 0.0 on pci14
> > 
> > OH, this is an 82573, and I've posted a firmware patcher a couple
> > different times, there is a bit in the MANC register that is incorrectly
> > programmed in some vendors systems. Can you search email for
> > that patcher, it needs to run from DOS. If you are unable to find
> > it let me know and I'll resent you a copy.
> > 
> > Jack
> 
> If you are referring to the dcgdis.ThisIsZip attachment, I found it in
> earlier threads, thanks. Will work on patching the nics and will keep
> the list updated.
> 
> Thanks again.
> 
> Sven
> 
I am happy to report that the firmware patch seems to have fixed the
issue and I can transfer data across the gigE network without the
watchdog timeouts and lockups. Thanks again!!

Sven



More information about the freebsd-stable mailing list