Under heavy load internet gets killed, only a reboot can bring it back up

PYUN Yong-Hyeon pyunyh at gmail.com
Wed Oct 15 05:37:40 PDT 2008


On Wed, Oct 15, 2008 at 04:31:01AM -0700, Jeremy Chadwick wrote:
 > On Wed, Oct 15, 2008 at 01:17:58PM +0200, Aniruddha wrote:
 > > On Wed, 2008-10-15 at 00:26 -0700, Jeremy Chadwick wrote:
 > > > On Wed, Oct 15, 2008 at 09:13:00AM +0200, Aniruddha wrote:
 > > > > Each time  my internet connection is under heavy lead it gets killed
 > > > > after a minute of 10. I tried the following commands to get the internet
 > > > > back up, but nothing helped:
 > > > > 
 > > > > /etc/rc.d/netif restart
 > > > > ifconfig mynic down
 > > > > ifconfig mynic up
 > > > > 
 > > > > Even worse the last time I issued a '/etc/rc.d/netif restart' my whole
 > > > > system hardlocked (wasn't responding to capslock presses). So far the
 > > > > only solution has been te reboot the computer. Is there any way I can
 > > > > prevent my internet connection from getting killed? How do I get it back
 > > > > up after it has been killed? Thanks in advance!
 > > > 
 > > > What network card are you using?  Can you provide output from the
 > > > following commands?
 > > > 
 > > > dmesg
 > > > vmstat -i
 > > > netstat -in
 > > > 
 > > I have a Marvell Yukon onboard nic.
 > > 
 > > 
 > > Here's the output:
 > > 
 > > netstat -in
 > > 
 > > Name    Mtu Network       Address              Ipkts Ierrs    Opkts
 > > Oerrs  Coll
 > > msk0   1500 <Link#1>             29     0       25     0     0
 > > msk0   1500 :        0     -        5     -     -
 > > msk0   1500 192.168.2.0/2 192.168.2.111          16     -       14     -
 > > -
 > > fwe0*  1500 <Link#2>              0     0        0     0     0
 > > fwip0  1500 <Link#3>              0     0        0     0     0
 > > lo0   16384 <Link#4>                               0     0        0
 > > 0     0
 > > lo0   16384 ::1/128       ::1                      0     -        0
 > > -     -
 > > lo0   16384 ::1/64                 0     -        0     -     -
 > > lo0   16384 127.0.0.0/8   127.0.0.1                0     -        0
 > > -     -
 > 
 > This looks okay.  I see no interface errors, which is good.
 > 
 > > vmstat -i
 > > interrupt                          total       rate
 > > irq17: atapci0+                       13          0
 > > irq18: atapci1+                     1045          5
 > > irq20: uhci0 ehci0                 13462         69
 > > irq21: fwohci0                         3          0
 > > irq23: atapci3                    102718        529
 > > cpu0: timer                       386229       1990
 > > irq256: mskc0                         46          0
 > > cpu1: timer                       376453       1940
 > > Total                             879969       4535
 > 
 > msk(4) appears to be using MSI/MSI-X here.
 > 
 > One thing worth trying would be to disable MSI/MSI-X.  You can disable
 > these by adding the following to your /boot/loader.conf :
 > 
 > hw.pci.enable_msix="0"
 > hw.pci.enable_msi="0"

The command above will disable all MSI/MSIX capability of box.
If the intention is to disable MSI feature of Marvell network
controller add "hw.msk.msi_disable="1" to /boot/loader.conf.
But I don't think you need to disable MSI capability unless you
have buggy PCI bridges. Without MSI msk(4) would normally share 
interrupts with other devices(e.g. USB).

 > 
 > > Copyright (c) 1992-2008 The FreeBSD Project.
 > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
 > > 	The Regents of the University of California. All rights reserved.
 > > FreeBSD is a registered trademark of The FreeBSD Foundation.
 > > FreeBSD 7.1-BETA #0: Sun Sep  7 13:49:18 UTC 2008
 > >     root at logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
 > > Timecounter "i8254" frequency 1193182 Hz quality 0
 > > CPU: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (3001.18-MHz 686-class CPU)

[...]

 > > mskc0: <Marvell Yukon 88E8053 Gigabit Ethernet> port 0xb800-0xb8ff mem 0xff8fc000-0xff8fffff irq 19 at device 0.0 on pci3
 > > msk0: <Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x02> on mskc0
 > > msk0: Ethernet address: 00:1e:8c:5a:62:da
 > > miibus0: <MII bus> on msk0
 > > e1000phy0: <Marvell 88E1111 Gigabit PHY> PHY 0 on miibus0
 > > e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto
 > > mskc0: [FILTER]

This controller is known to buggy one. See below.

[...]

 > > Trying to mount root from ufs:/dev/ad16s3a
 > > WARNING: / was not properly dismounted
 > > GEOM_LABEL: Label ext2fs/home removed.
 > > GEOM_LABEL: Label ext2fs/data removed.
 > > mskc0: Uncorrectable PCI Express error
 > > mskc0: Uncorrectable PCI Express error
 > 
 > Those errors at the end of your dmesg don't look good; could be the sign
 > of a NIC or motherboard that's going bad, or possibly a very strange
 > driver problem.

I guess the message above could be safely ignored.

 > 
 > Adding Yong-Hyeon PYUN to this thread, since he helps maintain the
 > msk(4) driver.  Yong-Hyeon, do you know of any conditions where heavy
 > network I/O could cause msk(4) to lock up or stop transmitting traffic,
 > or possibly hard-lock on ifconfig down/up?
 > 

I think workaround for the controller bug was committed to HEAD(SVN
r183346). To original poster, would you try latest if_msk.c from
HEAD?(Just copy if_msk.c/if_mskreg.h from HEAD to your box.)

-- 
Regards,
Pyun YongHyeon


More information about the freebsd-questions mailing list