Under heavy load internet gets killed, only a reboot can bring it back up

Jeremy Chadwick koitsu at FreeBSD.org
Wed Oct 15 04:31:04 PDT 2008


On Wed, Oct 15, 2008 at 01:17:58PM +0200, Aniruddha wrote:
> On Wed, 2008-10-15 at 00:26 -0700, Jeremy Chadwick wrote:
> > On Wed, Oct 15, 2008 at 09:13:00AM +0200, Aniruddha wrote:
> > > Each time  my internet connection is under heavy lead it gets killed
> > > after a minute of 10. I tried the following commands to get the internet
> > > back up, but nothing helped:
> > > 
> > > /etc/rc.d/netif restart
> > > ifconfig mynic down
> > > ifconfig mynic up
> > > 
> > > Even worse the last time I issued a '/etc/rc.d/netif restart' my whole
> > > system hardlocked (wasn't responding to capslock presses). So far the
> > > only solution has been te reboot the computer. Is there any way I can
> > > prevent my internet connection from getting killed? How do I get it back
> > > up after it has been killed? Thanks in advance!
> > 
> > What network card are you using?  Can you provide output from the
> > following commands?
> > 
> > dmesg
> > vmstat -i
> > netstat -in
> > 
> I have a Marvell Yukon onboard nic.
> 
> 
> Here's the output:
> 
> netstat -in
> 
> Name    Mtu Network       Address              Ipkts Ierrs    Opkts
> Oerrs  Coll
> msk0   1500 <Link#1>             29     0       25     0     0
> msk0   1500 :        0     -        5     -     -
> msk0   1500 192.168.2.0/2 192.168.2.111          16     -       14     -
> -
> fwe0*  1500 <Link#2>              0     0        0     0     0
> fwip0  1500 <Link#3>              0     0        0     0     0
> lo0   16384 <Link#4>                               0     0        0
> 0     0
> lo0   16384 ::1/128       ::1                      0     -        0
> -     -
> lo0   16384 ::1/64                 0     -        0     -     -
> lo0   16384 127.0.0.0/8   127.0.0.1                0     -        0
> -     -

This looks okay.  I see no interface errors, which is good.

> vmstat -i
> interrupt                          total       rate
> irq17: atapci0+                       13          0
> irq18: atapci1+                     1045          5
> irq20: uhci0 ehci0                 13462         69
> irq21: fwohci0                         3          0
> irq23: atapci3                    102718        529
> cpu0: timer                       386229       1990
> irq256: mskc0                         46          0
> cpu1: timer                       376453       1940
> Total                             879969       4535

msk(4) appears to be using MSI/MSI-X here.

One thing worth trying would be to disable MSI/MSI-X.  You can disable
these by adding the following to your /boot/loader.conf :

hw.pci.enable_msix="0"
hw.pci.enable_msi="0"

> Copyright (c) 1992-2008 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> 	The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 7.1-BETA #0: Sun Sep  7 13:49:18 UTC 2008
>     root at logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (3001.18-MHz 686-class CPU)
>   Origin = "GenuineIntel"  Id = 0x10676  Stepping = 6
>   Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>   Features2=0x8e3fd<SSE3,RSVD2,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,<b19>>
>   AMD Features=0x20000000<LM>
>   AMD Features2=0x1<LAHF>
>   Cores per package: 2
> real memory  = 3220701184 (3071 MB)
> avail memory = 3146145792 (3000 MB)
> ACPI APIC Table: <A_M_I_ OEMAPIC >
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
>  cpu0 (BSP): APIC ID:  0
>  cpu1 (AP): APIC ID:  1
> ioapic0 <Version 2.0> irqs 0-23 on motherboard
> kbd1 at kbdmux0
> ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
> acpi0: <A_M_I_ OEMRSDT> on motherboard
> acpi0: [ITHREAD]
> acpi0: Power Button (fixed)
> acpi0: reservation of 0, a0000 (3) failed
> acpi0: reservation of 100000, bff00000 (3) failed
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0
> pci5: <ACPI PCI bus> on pcib1
> vgapci0: <VGA-compatible display> port 0xc800-0xc8ff mem 0xd0000000-0xdfffffff,0xff9f0000-0xff9fffff irq 16 at device 0.0 on pci5
> pci5: <multimedia> at device 0.1 (no driver attached)
> pci0: <multimedia> at device 27.0 (no driver attached)
> pcib2: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
> pci4: <ACPI PCI bus> on pcib2
> pcib3: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0
> pci3: <ACPI PCI bus> on pcib3
> mskc0: <Marvell Yukon 88E8053 Gigabit Ethernet> port 0xb800-0xb8ff mem 0xff8fc000-0xff8fffff irq 19 at device 0.0 on pci3
> msk0: <Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x02> on mskc0
> msk0: Ethernet address: 00:1e:8c:5a:62:da
> miibus0: <MII bus> on msk0
> e1000phy0: <Marvell 88E1111 Gigabit PHY> PHY 0 on miibus0
> e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto
> mskc0: [FILTER]
> pcib4: <ACPI PCI-PCI bridge> irq 17 at device 28.5 on pci0
> pci2: <ACPI PCI bus> on pcib4
> atapci0: <JMicron AHCI controller> mem 0xff7fe000-0xff7fffff irq 17 at device 0.0 on pci2
> atapci0: [ITHREAD]
> atapci0: AHCI Version 01.00 controller with 2 ports detected
> ata2: <ATA channel 0> on atapci0
> ata2: [ITHREAD]
> ata3: <ATA channel 1> on atapci0
> ata3: [ITHREAD]
> atapci1: <JMicron JMB363 UDMA133 controller> port 0xac00-0xac07,0xa880-0xa883,0xa800-0xa807,0xa480-0xa483,0xa400-0xa40f at device 0.1 on pci2
> atapci1: [ITHREAD]
> ata4: <ATA channel 0> on atapci1
> ata4: [ITHREAD]
> uhci0: <UHCI (generic) USB controller> port 0xe480-0xe49f irq 20 at device 29.0 on pci0
> uhci0: [GIANT-LOCKED]
> uhci0: [ITHREAD]
> usb0: <UHCI (generic) USB controller> on uhci0
> usb0: USB revision 1.0
> uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
> uhub0: 2 ports with 2 removable, self powered
> uhci1: <UHCI (generic) USB controller> port 0xe800-0xe81f irq 17 at device 29.1 on pci0
> uhci1: [GIANT-LOCKED]
> uhci1: [ITHREAD]
> usb1: <UHCI (generic) USB controller> on uhci1
> usb1: USB revision 1.0
> uhub1: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
> uhub1: 2 ports with 2 removable, self powered
> uhci2: <UHCI (generic) USB controller> port 0xe880-0xe89f irq 18 at device 29.2 on pci0
> uhci2: [GIANT-LOCKED]
> uhci2: [ITHREAD]
> usb2: <UHCI (generic) USB controller> on uhci2
> usb2: USB revision 1.0
> uhub2: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb2
> uhub2: 2 ports with 2 removable, self powered
> uhci3: <UHCI (generic) USB controller> port 0xec00-0xec1f irq 19 at device 29.3 on pci0
> uhci3: [GIANT-LOCKED]
> uhci3: [ITHREAD]
> usb3: <UHCI (generic) USB controller> on uhci3
> usb3: USB revision 1.0
> uhub3: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb3
> uhub3: 2 ports with 2 removable, self powered
> ehci0: <Intel 82801GB/R (ICH7) USB 2.0 controller> mem 0xffafbc00-0xffafbfff irq 20 at device 29.7 on pci0
> ehci0: [GIANT-LOCKED]
> ehci0: [ITHREAD]
> usb4: EHCI version 1.0
> usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3
> usb4: <Intel 82801GB/R (ICH7) USB 2.0 controller> on ehci0
> usb4: USB revision 2.0
> uhub4: <Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb4
> uhub4: 8 ports with 8 removable, self powered
> uhub5: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/7.02, addr 2> on uhub4
> uhub5: single transaction translator
> uhub5: 4 ports with 4 removable, self powered
> umass0: <USB 2.0 USB Flash Drive, class 0/0, rev 2.00/1.00, addr 3> on uhub5
> umass1: <Generic Mass Storage Device, class 0/0, rev 2.00/1.00, addr 4> on uhub4
> pcib5: <ACPI PCI-PCI bridge> at device 30.0 on pci0
> pci1: <ACPI PCI bus> on pcib5
> fwohci0: <Texas Instruments TSB43AB22/A> mem 0xff6ff800-0xff6fffff,0xff6f8000-0xff6fbfff irq 21 at device 3.0 on pci1
> fwohci0: [FILTER]
> fwohci0: OHCI version 1.10 (ROM=1)
> fwohci0: No. of Isochronous channels is 4.
> fwohci0: EUI64 00:1e:8c:00:00:15:36:44
> fwohci0: Phy 1394a available S400, 2 ports.
> fwohci0: Link S400, max_rec 2048 bytes.
> firewire0: <IEEE1394(FireWire) bus> on fwohci0
> fwe0: <Ethernet over FireWire> on firewire0
> if_fwe0: Fake Ethernet address: 02:1e:8c:15:36:44
> fwe0: Ethernet address: 02:1e:8c:15:36:44
> fwip0: <IP over FireWire> on firewire0
> fwip0: Firewire address: 00:1e:8c:00:00:15:36:44 @ 0xfffe00000000, S400, maxrec 2048
> sbp0: <SBP-2/SCSI over FireWire> on firewire0
> dcons_crom0: <dcons configuration ROM> on firewire0
> dcons_crom0: bus_addr 0x1468000
> fwohci0: Initiate bus reset
> fwohci0: BUS reset
> fwohci0: node_id=0xc000ffc0, gen=1, CYCLEMASTER mode
> isab0: <PCI-ISA bridge> at device 31.0 on pci0
> isa0: <ISA bus> on isab0
> atapci2: <Intel ICH7 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0
> ata0: <ATA channel 0> on atapci2
> ata0: [ITHREAD]
> ata1: <ATA channel 1> on atapci2
> ata1: [ITHREAD]
> atapci3: <Intel AHCI controller> port 0xe400-0xe407,0xe080-0xe083,0xe000-0xe007,0xdc00-0xdc03,0xd880-0xd88f mem 0xffafb800-0xffafbbff irq 23 at device 31.2 on pci0
> atapci3: [ITHREAD]
> atapci3: AHCI Version 01.10 controller with 4 ports detected
> ata5: <ATA channel 0> on atapci3
> ata5: [ITHREAD]
> ata6: <ATA channel 1> on atapci3
> ata6: [ITHREAD]
> ata7: <ATA channel 2> on atapci3
> ata7: [ITHREAD]
> ata8: <ATA channel 3> on atapci3
> ata8: [ITHREAD]
> pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
> acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
> Timecounter "HPET" frequency 14318180 Hz quality 900
> cpu0: <ACPI CPU> on acpi0
> est0: <Enhanced SpeedStep Frequency Control> on cpu0
> est: CPU supports Enhanced Speedstep, but is not recognized.
> est: cpu_vendor GenuineIntel, msr 61a092006000920
> device_attach: est0 attach returned 6
> p4tcc0: <CPU Frequency Thermal Control> on cpu0
> cpu1: <ACPI CPU> on acpi0
> est1: <Enhanced SpeedStep Frequency Control> on cpu1
> est: CPU supports Enhanced Speedstep, but is not recognized.
> est: cpu_vendor GenuineIntel, msr 61a092006000920
> device_attach: est1 attach returned 6
> p4tcc1: <CPU Frequency Thermal Control> on cpu1
> acpi_button0: <Power Button> on acpi0
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
> sio0: type 16550A
> sio0: [FILTER]
> pmtimer0 on isa0
> orm0: <ISA Option ROM> at iomem 0xcf800-0xd27ff pnpid ORM0000 on isa0
> atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> atkbd0: [ITHREAD]
> ppc0: parallel port not found.
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> ums0: <Razer Razer Lachesis, class 0/0, rev 1.10/21.00, addr 2> on uhub0
> ums0: 7 buttons and Z dir.
> ukbd0: <Razer Razer Lachesis, class 0/0, rev 1.10/21.00, addr 2> on uhub0
> kbd2 at ukbd0
> ukbd1: <Logitech USB Multimedia Keyboard, class 0/0, rev 1.10/0.70, addr 3> on uhub0
> kbd3 at ukbd1
> uhid0: <Logitech USB Multimedia Keyboard, class 0/0, rev 1.10/0.70, addr 3> on uhub0
> uhid1: <Logitech Logitech Attack 3, class 0/0, rev 1.10/2.05, addr 2> on uhub1
> Timecounters tick every 1.000 msec
> firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
> firewire0: bus manager 0 (me)
> acd0: DVDR <ATAPI DVD A DH20A4P/9P58> at ata4-master UDMA66
> ad10: 953869MB <SAMSUNG HD103UJ 1AA01109> at ata5-master SATA300
> ad14: 953869MB <SAMSUNG HD103UJ 1AA01109> at ata7-master SATA300
> ad16: 152627MB <WDC WD1600AAJS-00WAA0 58.01D58> at ata8-master SATA300
> GEOM_LABEL: Label for provider ad10s1 is ext2fs/data2.
> GEOM_LABEL: Label for provider ad14s2 is ext2fs/root.
> GEOM_LABEL: Label for provider ad14s3 is ext2fs/home.
> GEOM_LABEL: Label for provider ad14s4 is ext2fs/data.
> da0 at umass-sim0 bus 0 target 0 lun 0
> da0: <USB 2.0 USB Flash Drive 0.00> Removable Direct Access SCSI-2 device 
> da0: 40.000MB/s transfers
> da0: 7712MB (15794176 512 byte sectors: 255H 63S/T 983C)
> SMP: AP CPU #1 Launched!
> da1 at umass-sim1 bus 1 target 0 lun 0
> da1: <Generic USB SD Reader 1.00> Removable Direct Access SCSI-0 device 
> da1: 40.000MB/s transfers
> da1: Attempt to query device size failed: NOT READY, Medium not present
> da2 at umass-sim1 bus 1 target 0 lun 1
> da2: <Generic USB CF Reader 1.01> Removable Direct Access SCSI-0 device 
> da2: 40.000MB/s transfers
> da2: Attempt to query device size failed: NOT READY, Medium not present
> da3 at umass-sim1 bus 1 target 0 lun 2
> da3: <Generic USB SM Reader 1.02> Removable Direct Access SCSI-0 device 
> da3: 40.000MB/s transfers
> da3: Attempt to query device size failed: NOT READY, Medium not present
> da4 at umass-sim1 bus 1 target 0 lun 3
> da4: <Generic USB MS Reader 1.03> Removable Direct Access SCSI-0 device 
> da4: 40.000MB/s transfers
> da4: Attempt to query device size failed: NOT READY, Medium not present
> Trying to mount root from ufs:/dev/ad16s3a
> WARNING: / was not properly dismounted
> GEOM_LABEL: Label ext2fs/home removed.
> GEOM_LABEL: Label ext2fs/data removed.
> mskc0: Uncorrectable PCI Express error
> mskc0: Uncorrectable PCI Express error

Those errors at the end of your dmesg don't look good; could be the sign
of a NIC or motherboard that's going bad, or possibly a very strange
driver problem.

Adding Yong-Hyeon PYUN to this thread, since he helps maintain the
msk(4) driver.  Yong-Hyeon, do you know of any conditions where heavy
network I/O could cause msk(4) to lock up or stop transmitting traffic,
or possibly hard-lock on ifconfig down/up?

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-questions mailing list