System hangs up every day

Дмитрий Комалеев d.komaleev at konliga.ru
Thu Nov 1 04:58:52 PDT 2007


> 
> A system failure of this sort (one which leaves no log entries of any
> kind) is generally a hardware fault; memory stick failures tend to
> cause kernel panics and easy repeatability.
> 
> I would suggest examining the hardware components, the motherboard
> could have some faulty capacitors (burst, leaking, or swollen); the
> fans on the processors could be failing causing a lockup, the power
> supply fans could be failing causing an undervolt and lockup, but this
> usually makes the system reset.
> 
> You get the idea, your symptoms are pointing to hardware 
> issues in my opinion.

I have written already, that I tried to plug a system HDD to another sever with the same configuration; on the new platform the system hanging didn't stop. The  RAID controller remained the same, but it has the own error log and it is clear.

> 
> On 10/31/07, Дмитрий Комалеев <d.komaleev at konliga.ru> wrote:
> > Hello everybody
> >
> > I have a big problem
> >
> > There is one FreeBSD server in our company. The server 
> platform is: Supermicro SuperServer 6014V-T2B (2x Intel Xeon 
> 2.8, 1Gb RAM, 3WARE 3W-8006-2LP RAID-Controller).
> > The server works as:
> > - a gateway between LAN and Internet
> > - an Intranet web- and database server (Apache + MySQL + PHP)
> > - a firewall (OpenBSD pf)
> > - a transparent proxy server (Squid)
> > A mounthly traffic through this server is about 100Gb. 
> There is about 200 internet users in our conpany.
> > Here is a part of my dmesg-listing:
> >
> > Copyright (c) 1992-2007 The FreeBSD Project.
> > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 
> 1992, 1993, 1994
> >         The Regents of the University of California. All 
> rights reserved.
> > FreeBSD is a registered trademark of The FreeBSD Foundation.
> > FreeBSD 6.2-RELEASE-p8 #2: Thu Oct 11 19:51:25 MSD 2007
> >     sa at gateway.konliga.ru:/usr/obj/usr/src/sys/KERNEL01_NOSMP
> > module_register: module pci/em already exists!
> > Module pci/em failed to register: 17
> > ACPI APIC Table: <A M I  OEMAPIC >
> > Timecounter "i8254" frequency 1193182 Hz quality 0
> > CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2800.12-MHz 686-class CPU)
> >   Origin = "GenuineIntel"  Id = 0xf43  Stepping = 3
> >   
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SE
P,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,> SS,HTT,TM,PBE>
> >   Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>>
> >   AMD Features=0x20000000<LM>
> >   Logical CPUs per core: 2
> > real memory  = 1073479680 (1023 MB)
> > avail memory = 1041465344 (993 MB)
> > ioapic0 <Version 2.0> irqs 0-23 on motherboard
> > ioapic1 <Version 2.0> irqs 24-47 on motherboard
> > ichwd module loaded
> > kbd1 at kbdmux0
> > ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, 
> RF2413, RF5413)
> > acpi0: <A M I OEMRSDT> on motherboard
> > acpi0: Power Button (fixed)
> > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
> > cpu0: <ACPI CPU> on acpi0
> > acpi_throttle0: <ACPI CPU Throttling> on cpu0
> > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> > pci0: <ACPI PCI bus> on pcib0
> > pcib1: <ACPI PCI-PCI bridge> irq 16 at device 2.0 on pci0
> > pci1: <ACPI PCI bus> on pcib1
> > pcib2: <ACPI PCI-PCI bridge> irq 16 at device 3.0 on pci0
> > pci2: <ACPI PCI bus> on pcib2
> > pcib3: <ACPI PCI-PCI bridge> at device 28.0 on pci0
> > pci3: <ACPI PCI bus> on pcib3
> > twe0: <3ware Storage Controller. Driver version 
> 1.50.01.002> port 0xbc00-0xbc0f mem 
> 0xfc9ffc00-0xfc9ffc0f,0xfc000000-0xfc7fffff irq 24 at device 
> 1.0 on pci3
> > twe0: [GIANT-LOCKED]
> > twe0: 2 ports, Firmware FE8S 1.05.00.068, BIOS BE7X 1.08.00.048
> > em0: <Intel(R) PRO/1000 Network Connection Version - 6.6.6> 
> port 0xb800-0xb83f mem 0xfc9c0000-0xfc9dffff irq 26 at device 
> 3.0 on pci3
> > em0: Ethernet address: 00:30:48:58:4d:2a
> > em0: [FAST]
> > em1: <Intel(R) PRO/1000 Network Connection Version - 6.6.6> 
> port 0xb400-0xb43f mem 0xfc9a0000-0xfc9bffff irq 27 at device 
> 4.0 on pci3
> > em1: Ethernet address: 00:30:48:58:4d:2b
> > em1: [FAST]
> > uhci0: <UHCI (generic) USB controller> port 0xe800-0xe81f 
> irq 16 at device 29.0 on pci0
> > uhci0: [GIANT-LOCKED]
> > usb0: <UHCI (generic) USB controller> on uhci0
> > usb0: USB revision 1.0
> > uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> > uhub0: 2 ports with 2 removable, self powered
> > uhci1: <UHCI (generic) USB controller> port 0xec00-0xec1f 
> irq 19 at device 29.1 on pci0
> > uhci1: [GIANT-LOCKED]
> > usb1: <UHCI (generic) USB controller> on uhci1
> > usb1: USB revision 1.0
> > uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> > uhub1: 2 ports with 2 removable, self powered
> > pci0: <base peripheral> at device 29.4 (no driver attached)
> > pci0: <base peripheral, interrupt controller> at device 
> 29.5 (no driver attached)
> > ehci0: <Intel 6300ESB USB 2.0 controller> mem 
> 0xfebffc00-0xfebfffff irq 23 at device 29.7 on pci0
> > ehci0: [GIANT-LOCKED]
> > usb2: EHCI version 1.0
> > usb2: companion controllers, 2 ports each: usb0 usb1
> > usb2: <Intel 6300ESB USB 2.0 controller> on ehci0
> > usb2: USB revision 2.0
> > uhub2: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
> > uhub2: 4 ports with 4 removable, self powered
> > pcib4: <ACPI PCI-PCI bridge> at device 30.0 on pci0
> > pci4: <ACPI PCI bus> on pcib4
> > pci4: <display, VGA> at device 5.0 (no driver attached)
> > isab0: <PCI-ISA bridge> at device 31.0 on pci0
> > isa0: <ISA bus> on isab0
> > atapci0: <Intel 6300ESB UDMA100 controller> port 
> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 
> 31.1 on pci0
> > ata0: <ATA channel 0> on atapci0
> > ata1: <ATA channel 1> on atapci0
> > pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
> > acpi_button0: <Power Button> on acpi0
> > acpi_button1: <Sleep Button> on acpi0
> > sio0: configured irq 4 not in bitmap of probed irqs 0
> > sio0: port may not be enabled
> > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 
> flags 0x10 on acpi0
> > sio0: type 16550A
> > sio1: configured irq 3 not in bitmap of probed irqs 0
> > sio1: port may not be enabled
> > sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
> > sio1: type 16550A
> > fdc0: <floppy drive controller (FDE)> port 
> 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
> > fdc0: [FAST]
> > fd0: <1440-KB 3.5" drive> on fdc0 drive 0
> > ppc0: <ECP parallel printer port> port 
> 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0
> > ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
> > ppc0: FIFO with 16/16/9 bytes threshold
> > ppbus0: <Parallel port bus> on ppc0
> > plip0: <PLIP network interface> on ppbus0
> > lpt0: <Printer> on ppbus0
> > lpt0: Interrupt-driven port
> > ppi0: <Parallel I/O> on ppbus0
> > atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> > atkbd0: <AT Keyboard> irq 1 on atkbdc0
> > kbd0 at atkbd0
> > atkbd0: [GIANT-LOCKED]
> > psm0: <PS/2 Mouse> irq 12 on atkbdc0
> > psm0: [GIANT-LOCKED]
> > psm0: model IntelliMouse, device ID 3
> > ichwd0: <Intel 6300ESB watchdog timer> on isa0
> > pmtimer0 on isa0
> > orm0: <ISA Option ROMs> at iomem 
> 0xc0000-0xc7fff,0xc8000-0xc8fff,0xc9800-0xca7ff,0xca800-0xcb7f
> f on isa0
> > sc0: <System console> at flags 0x100 on isa0
> > sc0: VGA <16 virtual consoles, flags=0x300>
> > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 
> 0xa0000-0xbffff on isa0
> > Timecounter "TSC" frequency 2800118202 Hz quality 800
> > Timecounters tick every 1.000 msec
> > acd0: CDROM <CD-224E-N/1.AA> at ata0-master UDMA33
> > twed0: <Unit 0, TwinStor, Normal> on twe0
> > twed0: 152626MB (312579760 sectors)
> > Trying to mount root from ufs:/dev/twed0s1a
> > ext0: link state changed to UP
> > int0: link state changed to UP
> > vlan0: link state changed to UP
> >
> > This server hangs up every day without any messages in the 
> log files and on the system console. A keyboard dosen't work 
> too. I can make only hard reset and after restart coredump 
> files are not appearing.
> > Here is my kernel configuration file:
> >
> > include GENERIC
> > ident           KERNEL01_NOSMP
> > device          ichwd # Intel ICH watchdog timer
> > #options        SMP
> > options         ALTQ
> > options         ALTQ_CBQ
> > options         ALTQ_RED
> > options         ALTQ_RIO
> > options         ALTQ_HFSC
> > options         ALTQ_PRIQ
> > #options                ALTQ_NOPCC
> > options         SC_DISABLE_REBOOT
> > options         MP_WATCHDOG
> > options         SW_WATCHDOG
> >
> > If I make and install a kernel with SMP options the system 
> under working load begins hang up every two hours.
> >
> > The two days "Memtest" gave no result.
> > I tried to install the newest Intel ethernet adapter 
> driver, but without any results.
> > As an experiment I tried also to plug a system HDD to 
> another sever platform (SuperServer 6015V-TB), but system 
> hanging didn't stop.
> > I think that it is not only hardware problem.
> > Linux (Gentoo) and Windows server 2003 on this hardware 
> were working fine.
> >
> > Please help me to find a solution and solve a problem.
> >
> > Your faithfully
> > Dmitry Komaleev
> > IT Manager
> > "EDIPRESSE-KONLIGA" http://www.konliga.ru
> > Russia, Moscow
> > tel.:  +7 (495) 775-14-35, ext. 169
> > fax:   +7 (495) 775-14-34
> >
> > P.S. I have written the Bug Report on my problem but have 
> received only one advice to turn off ACPI-option.
> > If I disable ACPI, then the RAID-controller and both of the 
> ethernet controllers on my server recieve the same IRQ. I 
> believe this is not good.
> > _______________________________________________
> > freebsd-stable at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to 
> "freebsd-stable-unsubscribe at freebsd.org"
> >
> 


More information about the freebsd-stable mailing list