CARP and em0 timeout watchdog

Fri Apr 20 15:50:04 UTC 2007

On Wed, 2007-04-18 at 11:50 -0400, Sven Willenberger wrote:
> I currently have a FreeBSD 6.2-RELEASE-p3 SMP with dual intel PRO/1000PM
> nics configured as follows:
> 
> em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
>         options=b<RXCSUM,TXCSUM,VLAN_MTU>
>         inet 192.168.0.18 netmask 0xffffff00 broadcast 192.168.0.255
>         ether 00:30:48:8d:5c:0a
>         media: Ethernet autoselect (1000baseTX <full-duplex>)
>         status: active
> em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 4096
>         options=b<RXCSUM,TXCSUM,VLAN_MTU>
>         inet 10.10.0.18 netmask 0xfffffff8 broadcast 10.10.0.23
>         ether 00:30:48:8d:5c:0b
>         media: Ethernet autoselect (1000baseTX <full-duplex>)
>         status: active
> 
> the em0 interface connects to the LAN while the em1 interface is
> connected to an identical box via CAT6 crossover cable (for
> ggate/gmirror).
> 
> Now, I have also configured a carp interface:
> 
> carp0: flags=49<UP,LOOPBACK,RUNNING> mtu 1500
>         inet 192.168.0.20 netmask 0xffffffff
>         carp: MASTER vhid 1 advbase 1 advskew 0
> 
> There are twin boxes here and I am running Samba. The problem is that
> with transfers across the carp IP (192.168.0.20) I end up with em0
> resetting after a watchdog timeout error. This occurs whether I transfer
> files from a windows box using a share (samba) or via ftp. This problem
> does *not* occur if I ftp to the 192.168.0.19 interface (non-virtual). I
> suspected cabling at first so had all the cabling in question replaced
> with fresh CAT6 to no avail. Several gigs of data can be transferred to
> the real interface (em0) without any issue at all; a max of maybe 1 - 2
> Gig can be transferred connected to the carp'ed IP before the em0 reset.
> Any ideas here?
> 
> Sven
> 

Having done more diagnostics I have found out it is not CARP related at
all. It turns out that the same timeouts will happen when ftp'ing to the
physical address IPs as well. There is also an odd situation here
depending on which protocol I use. The two boxes are connected to a Dell
Powerconnect 2616 gig switch with CAT6. If I scp files from the
192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a
hiccup (I used dd to create various sized testfiles from 32M to 1G in
size and just scp testfile* to the other box). On the other hand, if I
connect to 192.168.0.19 using ftp (either active or passive) where ftp
is being run through inetd, the interface resets (watchdog) within
seconds (a few MBs) of traffic. Enabling polling does nothing, nor does
changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing
such behavioral differences between scp and ftp?

Sven