re(4) problem
Daniel Gerzo
danger at FreeBSD.org
Thu Mar 6 12:11:40 PST 2008
Hello people,
I would like to report a problem with re(4) device.
I am running the following system:
FreeBSD 7.0-STABLE #2: Sat Mar 1 18:55:23 CET 2008 amd64
The system is build including a patch available at:
http://people.freebsd.org/~yongari/re/re.HEAD.patch
The problem occoured already 3 times (in around a week period of
time), always suddenly after some time. I don't know how to reproduce
it :-(
The machine in a question has two NIC cards, one em(4) based and one
re(4) based. When a problem occurs, I am able to connect to the
machine only through em(4) - with no problems.
The symptons are following:
- the machine does not reply to a icmp echo requests to the re(4)
device
- When I try to ping some remote host over re(4) based card I get:
ping: sendto: No buffer space available
- When I run tcpdump -vv -i re0, I can see only arp requests (ha-web1
is the machine in question) no other reasonable traffic:
20:30:20.945662 arp who-has 85.10.197.188 tell 85.10.197.161
20:30:20.947624 arp who-has 85.10.197.189 tell 85.10.197.161
20:30:20.949021 arp who-has 85.10.197.190 tell 85.10.197.161
20:30:21.136417 arp who-has ha-web1 tell 85.10.199.1
20:30:22.153493 arp who-has 85.10.197.169 tell 85.10.197.161
20:30:23.286400 arp who-has ha-web1 tell 85.10.199.1
20:30:23.299547 arp who-has 85.10.199.12 tell 85.10.199.1
- The output of netstat -m:
root@[ha-web1 /home/danger]# netstat -m
1047/648/1695 mbufs in use (current/cache/total)
879/335/1214/25600 mbuf clusters in use (current/cache/total/max)
879/267 mbuf+clusters out of packet secondary zone in use
(current/cache)
16/265/281/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
2092K/1892K/3984K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
37742 requests for I/O initiated by sendfile
0 calls to protocol drain routines
- ifconfig re0 output:
danger@[ha-web1 ~]> ifconfig
re0: flags=8c43<UP,BROADCAST,RUNNING,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:1d:92:34:12:7a
inet 85.10.199.6 netmask 0xffffffe0 broadcast 85.10.199.31
media: Ethernet autoselect (100baseTX <full-duplex>)
status: active
- When I run ifconfig re0 down, the devices doesn't go down unless I
type also ifconfig re0 up. In the meantime ifconfig still says that
the device is active and /var/log/messages doesn't mention it has gone
down.
When I also type ifconfig re0 up, the device goes down and
immediately up, but the network still doesn't work, however I don't get
ENOBUFS error when I try to ping a remote host anymore.
After this procedure I am unable to ssh to this box over em(4) as
well (ping works).
Now, when I run /etc/rc.d/netif restart, I can connect to the
machine over em(4) again. When I ping remote host over re(4), I get
ping: sendto: No route to host. When I run /etc/rc.d/routing
restart, ping doesn't report anything, but I can see again arp
requests over tcpdump.
- No interrupt storms are being reported in /var/log/messages, also it
doesn't include anything strange, either dmesg.
I suppose its a bug in re(4), otherwise I assume that the network
wouldn't work over em(4) as well.
If you need any information I can provide to help debug this problem,
please let me know, I will leave the machine in this status if a
customer permits me to do so.
--
Best Regards,
Daniel Gerzo mailto:danger at FreeBSD.org
More information about the freebsd-current
mailing list