re(4) problem

Thu Mar 6 12:11:40 PST 2008

Hello people,

  I would like to report a problem with re(4) device.

  I am running the following system:
  FreeBSD 7.0-STABLE #2: Sat Mar  1 18:55:23 CET 2008 amd64

  The system is build including a patch available at:
  http://people.freebsd.org/~yongari/re/re.HEAD.patch

  The problem occoured already 3 times (in around a week period of
  time), always suddenly after some time. I don't know how to reproduce
  it :-(

  The machine in a question has two NIC cards, one em(4) based and one
  re(4) based. When a problem occurs, I am able to connect to the
  machine only through em(4) - with no problems.

  The symptons are following:

  - the machine does not reply to a icmp echo requests to the re(4)
    device

  - When I try to ping some remote host over re(4) based card I get:

  ping: sendto: No buffer space available

  - When I run tcpdump -vv -i re0, I can see only arp requests (ha-web1
    is the machine in question) no other reasonable traffic:

20:30:20.945662 arp who-has 85.10.197.188 tell 85.10.197.161
20:30:20.947624 arp who-has 85.10.197.189 tell 85.10.197.161
20:30:20.949021 arp who-has 85.10.197.190 tell 85.10.197.161
20:30:21.136417 arp who-has ha-web1 tell 85.10.199.1
20:30:22.153493 arp who-has 85.10.197.169 tell 85.10.197.161
20:30:23.286400 arp who-has ha-web1 tell 85.10.199.1
20:30:23.299547 arp who-has 85.10.199.12 tell 85.10.199.1

  - The output of netstat -m:

root@[ha-web1 /home/danger]# netstat -m
1047/648/1695 mbufs in use (current/cache/total)
879/335/1214/25600 mbuf clusters in use (current/cache/total/max)
879/267 mbuf+clusters out of packet secondary zone in use
(current/cache)
16/265/281/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
2092K/1892K/3984K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
37742 requests for I/O initiated by sendfile
0 calls to protocol drain routines

  - ifconfig re0 output:

danger@[ha-web1 ~]> ifconfig
re0: flags=8c43<UP,BROADCAST,RUNNING,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:1d:92:34:12:7a
        inet 85.10.199.6 netmask 0xffffffe0 broadcast 85.10.199.31
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active

  - When I run ifconfig re0 down, the devices doesn't go down unless I
    type also ifconfig re0 up. In the meantime ifconfig still says that
    the device is active and /var/log/messages doesn't mention it has gone
    down.
    When I also type ifconfig re0 up, the device goes down and
    immediately up, but the network still doesn't work, however I don't get
    ENOBUFS error when I try to ping a remote host anymore.
    After this procedure I am unable to ssh to this box over em(4) as
    well (ping works).
    Now, when I run /etc/rc.d/netif restart, I can connect to the
    machine over em(4) again. When I ping remote host over re(4), I get
    ping: sendto: No route to host. When I run /etc/rc.d/routing
    restart, ping doesn't report anything, but I can see again arp
    requests over tcpdump.

  - No interrupt storms are being reported in /var/log/messages, also it
    doesn't include anything strange, either dmesg.

  I suppose its a bug in re(4), otherwise I assume that the network
  wouldn't work over em(4) as well. 

  If you need any information I can provide to help debug this problem,
  please let me know, I will leave the machine in this status if a
  customer permits me to do so.

-- 
Best Regards,
   Daniel Gerzo				mailto:danger at FreeBSD.org