em network issues

Albert Shih shih at math.jussieu.fr
Wed Oct 18 23:03:50 UTC 2006


 Le 18/10/2006  10:46:30-0700, Jack Vogel a écrit
> I think there may be a few different problems going on with the em driver
> on 6.2 that are being lumped under the general description of network
> hangs. In order to solve these I need a reproducible failure, either on a
> system here at Intel, or someone who is willing to be a remote guinea
> pig :)
> 
> I need detailed reports, meaning EXACT system data, if its an OEM
> box, what model, what addons, a pciconf list, description of the
> network, and anything special that is connected with the problem
> occurence.  OH, and if you have a 'before and after' situation, then
> please give driver deltas that worked, and which failed.

Well....

BOX : HP Proliant ML350 G4
All addons is HP 

Here the network config

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=8<VLAN_MTU>
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
fxp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=8<VLAN_MTU>
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=1b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active

The pciconf -l

[root at nfs3 ~]# pciconf -l
hostb0 at pci0:0:0:        class=0x060000 card=0x32000e11 chip=0x35908086 rev=0x0c hdr=0x00
pcib1 at pci0:2:0: class=0x060400 card=0x00000050 chip=0x35958086 rev=0x0c hdr=0x01
pcib4 at pci0:4:0: class=0x060400 card=0x00000050 chip=0x35978086 rev=0x0c hdr=0x01
pcib5 at pci0:6:0: class=0x060400 card=0x00000050 chip=0x35998086 rev=0x0c hdr=0x01
pcib6 at pci0:28:0:        class=0x060400 card=0x00000050 chip=0x25ae8086 rev=0x02 hdr=0x01
none0 at pci0:29:0:        class=0x0c0300 card=0x32010e11 chip=0x25a98086 rev=0x02 hdr=0x00
none1 at pci0:29:1:        class=0x0c0300 card=0x32010e11 chip=0x25aa8086 rev=0x02 hdr=0x00
none2 at pci0:29:4:        class=0x088000 card=0x32010e11 chip=0x25ab8086 rev=0x02 hdr=0x00
none3 at pci0:29:5:        class=0x080020 card=0x32010e11 chip=0x25ac8086 rev=0x02 hdr=0x00
none4 at pci0:29:7:        class=0x0c0320 card=0x32010e11 chip=0x25ad8086 rev=0x02 hdr=0x00
pcib8 at pci0:30:0:        class=0x060400 card=0x00000000 chip=0x244e8086 rev=0x0a hdr=0x01
isab0 at pci0:31:0:        class=0x060100 card=0x00000000 chip=0x25a18086 rev=0x02 hdr=0x00
atapci0 at pci0:31:1:      class=0x01018a card=0x32010e11 chip=0x25a28086 rev=0x02 hdr=0x00
pcib2 at pci5:0:0: class=0x060400 card=0x00000044 chip=0x03298086 rev=0x09 hdr=0x01
pcib3 at pci5:0:2: class=0x060400 card=0x00000044 chip=0x032a8086 rev=0x09 hdr=0x01
isp0 at pci6:1:0:  class=0x0c0400 card=0x01000e11 chip=0x23121077 rev=0x02 hdr=0x00
em0 at pci9:1:0:   class=0x020000 card=0x00db0e11 chip=0x10108086 rev=0x01 hdr=0x00
em1 at pci9:1:1:   class=0x020000 card=0x00db0e11 chip=0x10108086 rev=0x01 hdr=0x00
ciss0 at pci9:2:0: class=0x010400 card=0x409a0e11 chip=0x00460e11 rev=0x01 hdr=0x00
pcib7 at pci2:2:0: class=0x060400 card=0x000000dc chip=0xb1548086 rev=0x00 hdr=0x01
mpt0 at pci2:3:0:  class=0x010000 card=0x00da0e11 chip=0x00301000 rev=0x08 hdr=0x00
mpt1 at pci2:3:1:  class=0x010000 card=0x00da0e11 chip=0x00301000 rev=0x08 hdr=0x00
fxp0 at pci3:4:0:  class=0x020000 card=0xb1630e11 chip=0x12298086 rev=0x08 hdr=0x00
fxp1 at pci3:5:0:  class=0x020000 card=0xb1630e11 chip=0x12298086 rev=0x08 hdr=0x00
bge0 at pci1:2:0:  class=0x020000 card=0x00e30e11 chip=0x165414e4 rev=0x03 hdr=0x00
none5 at pci1:3:0: class=0x030000 card=0x001e0e11 chip=0x47521002 rev=0x27 hdr=0x00
none6 at pci1:4:0: class=0x088000 card=0x00d70e11 chip=0x00d70e11 rev=0x01 hdr=0x00
[root at nfs3 ~]# 

This server has only one purpose : nfsd.

There are a MSA1000 (disk array) connected in FC with Qlogic FC card.

History of my problem :

The server is buy on january 2006 (in replacement of old HP) only the
server is news, the MSA1000 is the old one.

I've install FreeBSD 6.x on this server.

Because I've 0 problem with the old server, I directly put my server in
production (I known bad idea...)

After some weeks the problem begin with lost em interface (watchdog),
sometime it's fxp (but very rarely).

When this append nothing can fix (without reboot).

I make many cvs, to swapp on 6-Stable, but nothing change. After some weeks
the server just hang-on or the network em is on watchdog status.

In ~march 2006 someone on this mailing list tel me I can put the interface
on polling mode with no SMP.

This thing work very fine until september 2006. Without any change on my
server (no cvs, no buildkernel, no reboot), no change on the nfs clients
(linux/FreebSD) the problem come again. 

The first crash is the server hang-on.

I've make cvsup/buildworld/buildkernel.

After some day the server hang-on again.

When we are in polling mode I don't have the message «em* watchdog», but
the server just hang-on (event on the console).

When I make no polling I've got the em* watchdog message.

Now I run (from yeasterday) in this mode :

	no-SMP
	no-polling
	the patch http://lists.freebsd.org/pipermail/freebsd-stable/2006-October/029224.html
	and I build a kernel without USB (because I've got many IRQ on usb).

Of course it's to short to tell if the problem is solve.

> hardware. There is a fix for this, you tell the portmapper to
> not use ports below 665, in particular:
> 
>          sysctl net.inet.ip.portrange.lowlast 665 (default is 600)
> 
> So, if you have IPMI or AMT hardware, you should try this
> change and see if it fixes hangs.

I don't known if I've AMT but I put this on my sysctl.conf.

> 
> There is also a hardware eeprom issue on systems with an 82573
> type NIC on SOME systems. There is a utility to fix that, if you

and on HP ?

I'm sorry for :

	1/ My bad english
	2/ The server is on production ... I can make many change or test

but if I can help....

Thanks for all_FB_dev 

Regards.



--
Albert SHIH
Universite de Paris 7 (Denis DIDEROT)
U.F.R. de Mathematiques.
7 ième étage, plateau D, bureau 10
Heure local/Local time:
Thu Oct 19 00:43:39 CEST 2006


More information about the freebsd-stable mailing list