FreeBSD unstable on Dell 1750 using SMP?

Rutger Bevaart rutger.bevaart at illian.net
Sun Nov 13 23:33:03 PST 2005


hello list,

Our Dell 1750's and 1850's are still giving me headaches.

Symptom:
Our Dell PE1750 will reboot at random intervals (between 3 and 130 days),
seemingly unrelated to the load of the system.

Config:
Dell PE1750, dual 3.06 Xeon (533FSB/512Kb), 2x512MB memory, Perc RAID
(amr) with 3 drives in RAID5. No add-on cards.

FreeBSD:
Same problem on 5.3-RELEASE, 5.3-p3, 5.3-p-something and 5.4-p5.

I disabled HTT in the BIOS but the machine rebooted 3 days later. Argh!
Now I have no clue whatsoever on how to proceed. No kernel tweaks have
been made and no strange software is running. No logs are written to
/var/log/messages. Attached is my dmesg output.

Anybody any clues on how to proceed? I like these boxes ;-)

Thanks,
Rutger

--dmesg output
> Copyright (c) 1992-2005 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights reserved.
> FreeBSD 5.4-RELEASE-p5 #0: Sun Jul 24 15:57:47 CEST 2005
>     root at darwin.illian.net:/usr/obj/usr/src/sys/darwin-smp
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Intel(R) Xeon(TM) CPU 3.06GHz (3047.91-MHz 686-class CPU)
>   Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
>   Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> real memory  = 1073573888 (1023 MB)
> avail memory = 1041018880 (992 MB)
> ACPI APIC Table: <DELL   PE1750  >
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
>  cpu0 (BSP): APIC ID:  0
>  cpu1 (AP): APIC ID:  6
> ioapic0: Changing APIC ID to 8
> ioapic1: Changing APIC ID to 9
> ioapic2: Changing APIC ID to 10
> MADT: Forcing active-low polarity and level trigger for SCI
> ioapic0 <Version 1.1> irqs 0-15 on motherboard
> ioapic1 <Version 1.1> irqs 16-31 on motherboard
> ioapic2 <Version 1.1> irqs 32-47 on motherboard
> npx0: <math processor> on motherboard
> npx0: INT 16 interface
> acpi0: <DELL PE1750> on motherboard
> acpi0: Power Button (fixed)
> Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
> acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
> cpu0: <ACPI CPU> on acpi0
> cpu1: <ACPI CPU> on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> pci0: <display, VGA> at device 14.0 (no driver attached)
> atapci0: <ServerWorks CSB5 UDMA100 controller> port
0x8b0-0x8bf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 15.1 on pci0
> ata0: channel #0 on atapci0
> ata1: channel #1 on atapci0
> ohci0: <OHCI (generic) USB controller> mem 0xfe100000-0xfe100fff irq 11
at device
15.2 on pci0
> usb0: OHCI version 1.0, legacy support
> usb0: SMM does not respond, resetting
> usb0: <OHCI (generic) USB controller> on ohci0
> usb0: USB revision 1.0
> uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub0: 4 ports with 4 removable, self powered
> isab0: <PCI-ISA bridge> at device 15.3 on pci0
> isa0: <ISA bus> on isab0
> pcib1: <ACPI Host-PCI bridge> on acpi0
> pci4: <ACPI PCI bus> on pcib1
> amr0: <LSILogic MegaRAID 1.51> mem
0xfcd00000-0xfcd3ffff,0xf0000000-0xf7ffffff irq
18 at device 3.0 on pci4
> amr0: <LSILogic PERC 4/Di> Firmware 412W, BIOS H406, 128MB RAM
> pcib2: <ACPI Host-PCI bridge> on acpi0
> pci3: <ACPI PCI bus> on pcib2
> pcib3: <ACPI Host-PCI bridge> on acpi0
> pci2: <ACPI PCI bus> on pcib3
> bge0: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2002> mem
0xfcf20000-0xfcf2ffff,0xfcf30000-0xfcf3ffff irq 16 at device 0.0 on pci2
> miibus0: <MII bus> on bge0
> brgphy0: <BCM5704 10/100/1000baseTX PHY> on miibus0
> brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
1000baseTX-FDX, auto
> bge0: Ethernet address: 00:11:43:5a:84:9d
> bge1: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2002> mem
0xfcf00000-0xfcf0ffff,0xfcf10000-0xfcf1ffff irq 17 at device 0.1 on pci2
> miibus1: <MII bus> on bge1
> brgphy1: <BCM5704 10/100/1000baseTX PHY> on miibus1
> brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
1000baseTX-FDX, auto
> bge1: Ethernet address: 00:11:43:5a:84:9e
> pcib4: <ACPI Host-PCI bridge> on acpi0
> pci1: <ACPI PCI bus> on pcib4
> fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
> fd0: <1440-KB 3.5" drive> on fdc0 drive 0
> sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on
acpi0
> sio0: type 16550A
> orm0: <ISA Option ROMs> at iomem
0xec000-0xeffff,0xcb800-0xccfff,0xc8000-0xc8fff,0xc0000-0xc7fff on isa0
> pmtimer0 on isa0
> atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> ppc0: parallel port not found.
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> Timecounters tick every 10.000 msec
> acd0: CDROM <SAMSUNG CD-ROM SN-124/N104> at ata1-master PIO4
> amrd0: <LSILogic MegaRAID logical drive> on amr0
> amrd0: 139760MB (286228480 sectors) RAID 5 (optimal)
> ses0 at amr0 bus 0 target 6 lun 0
> ses0: <PE/PV 1x3 SCSI BP 1.1> Fixed Processor SCSI-2 device
> ses0: SAF-TE Compliant Device
> SMP: AP CPU #1 Launched!
> Mounting root from ufs:/dev/amrd0s1a
> WARNING: / was not properly dismounted
> WARNING: /opt was not properly dismounted
> /opt: mount pending error: blocks 108 files 1
> WARNING: /tmp was not properly dismounted
> WARNING: /usr was not properly dismounted
> WARNING: /usr/local was not properly dismounted
> WARNING: /var was not properly dismounted
> ipfw2 initialized, divert disabled, rule-based forwarding disabled,
default to
deny, logging disabled

--end of dmesg output




> Suggest: Disable HT...
> I had the same problem of unexplained server restarts on 5.3 last December
> and strongly suspected a DMA issue or a dual channel memory paging foobar
> when HT was enabled. The benefits of HT over simply running SMP on just
> two physical CPUs are apparently zero in your case as loads seem light and
> therefore spreading threads across more CPUs does not seem to be an
> advantage. In short, turn hyperthreading off. In my case that's what I did
> and the crashes (which also seemed to be about three days apart or so)
> stopped.
>
> Good luck.
>
> best... Mike

>
> Message: 1
> Date: Fri, 15 Jul 2005 15:15:24 +0200 (CEST)
> From: "Rutger Bevaart" <rutger.bevaart at illian.net>
> Subject: FreeBSD unstable on Dell 1750 using SMP?
> To: freebsd-smp at freebsd.org
> Message-ID: <24434.193.172.18.3.1121433324.squirrel at 193.172.18.3>
> Content-Type: text/plain;charset=iso-8859-1
>
>
> hello list,
>
> For the past year we've been running several Dell PowerEdge 1750 servers
> on FreeBSD 4.10, 4.11 and 5.3. All these machines have dual Xeons running
> with HT enabled. This install has proven to be unstable in that the
> machine will reboot between 3 days and 170 days without apparant reason.
> No log is written. Other machines we have with a single CPU (HT enabled)
> do not experience this problem.
>
> As it is present in both 4.x and 5.x and googling the last year has not
> revealed similar experience I'm consulting this list. As all of these
> machines are productions machines that have a continuous load (not heavly
> load, but a light average - some peaks) it's not easy to experiment with
> HT setting etc. I dislike driving to the datacenter for locked systems
> with fubarred kernels ;-)
>
> The only error i've ever seen just before a reboot is "bge0: discard frame
> w/o packet header" on the 5.3 machine.
>
> Any clues or help greatly appreciated!
>
> Regards
> Rutger Bevaart
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 15 Jul 2005 21:47:46 -0400
> From: Lucas Holt <luke at foolishgames.com>
> Subject: Re: FreeBSD unstable on Dell 1750 using SMP?
> To: Rutger Bevaart <rutger.bevaart at illian.net>
> Cc: freebsd-smp at freebsd.org
> Message-ID: <3713FA02-FDBB-4B24-A592-F55B7A485C26 at foolishgames.com>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>
> I can't speak for that config or network card, but I had a similar
> experience with freebsd 5.2 and 5.3.  I got unusual errors
> occasionally for an rl nic and the machine randomly rebooted.
> Sometimes nothing was logged.  It turned out to be the network card.
> I replaced the NIC with a 3com 3c905c TX and the problem went away.
>
> Its possible that the dell NICs are non standard and the driver isn't
> handling them well.  I've noticed problems with dell nics and
> standard drivers in their other products (lattitude d800, etc).  If
> you really thought it was an SMP issue, I suppose you could compile
> and run a non smp kernel as a test.
>
> Its also possible that "4" processor smp isn't as reliable as 2.  I
> have read about scalability issues in the past with large numbers of
> cpus in freebsd.
>
> On Jul 15, 2005, at 9:15 AM, Rutger Bevaart wrote:
>
>>
>> hello list,
>>
>> For the past year we've been running several Dell PowerEdge 1750
>> servers
>> on FreeBSD 4.10, 4.11 and 5.3. All these machines have dual Xeons
>> running
>> with HT enabled. This install has proven to be unstable in that the
>> machine will reboot between 3 days and 170 days without apparant
>> reason.
>> No log is written. Other machines we have with a single CPU (HT
>> enabled)
>> do not experience this problem.
>>
>> As it is present in both 4.x and 5.x and googling the last year has
>> not
>> revealed similar experience I'm consulting this list. As all of these
>> machines are productions machines that have a continuous load (not
>> heavly
>> load, but a light average - some peaks) it's not easy to experiment
>> with
>> HT setting etc. I dislike driving to the datacenter for locked systems
>> with fubarred kernels ;-)
>>
>> The only error i've ever seen just before a reboot is "bge0:
>> discard frame
>> w/o packet header" on the 5.3 machine.
>>
>> Any clues or help greatly appreciated!
>>
>> Regards
>> Rutger Bevaart
>>
>> _______________________________________________
>> freebsd-smp at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-smp
>> To unsubscribe, send any mail to "freebsd-smp-unsubscribe at freebsd.org"
>>
>
>
> Lucas Holt
> Luke at FoolishGames.com
> ________________________________________________________
> FoolishGames.com  (Jewel Fan Site)
> JustJournal.com (Free blogging)
> FoolishGames.net (Enemy Territory IoM site)
>
> Think PC.. in 2006 you can own an Apple PCintosh. Whats next, windows
> works?
>
>
>
> ------------------------------
>
> _______________________________________________
> freebsd-smp at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-smp
> To unsubscribe, send any mail to "freebsd-smp-unsubscribe at freebsd.org"
>
> End of freebsd-smp Digest, Vol 101, Issue 4
> *******************************************
>


Rutger Bevaart :: illian.networks




More information about the freebsd-smp mailing list