CARP and em0 timeout watchdog

Fri Apr 20 18:27:59 UTC 2007

On 4/20/07, Sven Willenberger <sven at dmv.com> wrote:
> On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote:
> > On 4/20/07, Jeremy Chadwick <koitsu at freebsd.org> wrote:
> > > On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote:
> > > > Having done more diagnostics I have found out it is not CARP related at
> > > > all. It turns out that the same timeouts will happen when ftp'ing to the
> > > > physical address IPs as well. There is also an odd situation here
> > > > depending on which protocol I use. The two boxes are connected to a Dell
> > > > Powerconnect 2616 gig switch with CAT6. If I scp files from the
> > > > 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a
> > > > hiccup (I used dd to create various sized testfiles from 32M to 1G in
> > > > size and just scp testfile* to the other box). On the other hand, if I
> > > > connect to 192.168.0.19 using ftp (either active or passive) where ftp
> > > > is being run through inetd, the interface resets (watchdog) within
> > > > seconds (a few MBs) of traffic. Enabling polling does nothing, nor does
> > > > changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing
> > > > such behavioral differences between scp and ftp?
> > >
> > > You'll get a much higher throughput rate with FTP than you will with
> > > SSH, simply because encryption overhead is quite high (even with the
> > > Blowfish cipher).  With a very fast processor and on a gigE network
> > > you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP.
> > > That's the only difference I can think of.
> > >
> > > The watchdog resets I can't explain; Jack Vogel should be able to assist
> > > with that.  But it sounds like the resets only happen under very high
> > > throughput conditions (which is why you'd see it with FTP but not SSH).
> >
> > What kind of hardware is this interface? Watchdogs mean TX cleanup
> > isn't happening in a reasonable time, without further data its hard to
> > know what might be going on.
> >
> > Jack
>
> from pciconf:
>
> em0 at pci13:0:0:  class=0x020000 card=0x108c15d9 chip=0x108c8086 rev=0x03
> hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = 'PRO/1000 PM'
>     class    = network
>     subclass = ethernet
> em1 at pci14:0:0:  class=0x020000 card=0x109a15d9 chip=0x109a8086 rev=0x00
> hdr=0x00
>     vendor   = 'Intel Corporation'
>     class    = network
>     subclass = ethernet
>
> em0 is the interface in question.
>
> from dmesg:
>
> em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
> 0x4000-0x401f mem 0xe0300000-0xe031ffff irq 16 at device 0.0 on pci13
>
> em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
> 0x5000-0x501f mem 0xe0400000-0xe041ffff irq 17 at device 0.0 on pci14

OH, this is an 82573, and I've posted a firmware patcher a couple
different times, there is a bit in the MANC register that is incorrectly
programmed in some vendors systems. Can you search email for
that patcher, it needs to run from DOS. If you are unable to find
it let me know and I'll resent you a copy.

Jack