Dual AMD MP unstable under heavy load when smp is active

Daniel Ponticello bug at camisano.net
Wed Mar 5 09:40:19 UTC 2008


Hi Danny,
i made some tests with FreeBSD 7.0 Prerelease in december and the problem
is no longer present.
The crash and seg fault you see seems to be related to ACPI/SMP implementation
of freebsd6.
The problem is also present and more evident with VMWare virtual hardware.
No problems if you are using Intel hardware.


Hope this helps.


Daniel


-----Original message-----
From: Danny Fullerton northox at mantor.org
Date: Wed, 05 Mar 2008 04:32:03 +0100
To: freebsd-smp at freebsd.org
Subject: Re: Dual AMD MP unstable under heavy load when smp is active

> Hello Paul,
> 
> I would like to known if done those test with the recent FreeBSD 7.0? I
> seen lots of work in the SMP area of this release and I'm wondering if I
> could have better chance with this version.
> 
> thanks,
> 
> dmesg with smp on (GENERIC + option smp):
> 
> Copyright (c) 1992-2008 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 6.3-RELEASE-p1 #0: Wed Feb 27 21:11:40 EST 2008
>     root at megatron.mantor.org:/usr/obj/usr/src/sys/MEGATRONTEST
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: AMD Athlon(tm) MP 2200+ (1800.07-MHz 686-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x680  Stepping = 0
>  
> Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
>   AMD Features=0xc0480800<SYSCALL,MP,MMX+,3DNow!+,3DNow!>
> real memory  = 3220701184 (3071 MB)
> avail memory = 3146387456 (3000 MB)
> ACPI APIC Table: <PTLTD          APIC  >
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
>  cpu0 (BSP): APIC ID:  1
>  cpu1 (AP): APIC ID:  0
> MADT: Forcing active-low polarity and level trigger for SCI
> ioapic0 <Version 1.1> irqs 0-23 on motherboard
> kbd1 at kbdmux0
> ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
> hptrr: HPT RocketRAID controller driver v1.1 (Feb 27 2008 21:11:16)
> acpi0: <PTLTD   RSDT> on motherboard
> acpi0: Power Button (fixed)
> acpi0: Sleep Button (fixed)
> Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0
> cpu0: <ACPI CPU> on acpi0
> cpu1: <ACPI CPU> on acpi0
> acpi_button0: <Power Button> on acpi0
> pcib0: <ACPI Host-PCI bridge> port
> 0xcf8-0xcff,0x8000-0x807f,0x8080-0x80ff iomem 0xd8000-0xdbfff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> agp0: <AMD 762 host to AGP bridge> port 0x1810-0x1813 mem
> 0xf8000000-0xfbffffff,0xf6210000-0xf6210fff at device 0.0 on pci0
> pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
> pci1: <ACPI PCI bus> on pcib1
> isab0: <PCI-ISA bridge> at device 7.0 on pci0
> isa0: <ISA bus> on isab0
> atapci0: <AMD 768 UDMA100 controller> port
> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 7.1 on pci0
> ata0: <ATA channel 0> on atapci0
> ata1: <ATA channel 1> on atapci0
> pci0: <bridge> at device 7.3 (no driver attached)
> amr0: <LSILogic MegaRAID 1.53> mem 0xf6200000-0xf620ffff irq 20 at
> device 8.0 on pci0
> amr0: delete logical drives supported by controller
> amr0: <LSILogic PERC 4/DC> Firmware 350O, BIOS 1.09, 128MB RAM
> ahc0: <Adaptec aic7899 Ultra160 SCSI adapter> port 0x1000-0x10ff mem
> 0xf4000000-0xf4000fff irq 20 at device 10.0 on pci0
> ahc0: [GIANT-LOCKED]
> aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
> ahc1: <Adaptec aic7899 Ultra160 SCSI adapter> port 0x1400-0x14ff mem
> 0xf4001000-0xf4001fff irq 21 at device 10.1 on pci0
> ahc1: [GIANT-LOCKED]
> aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
> pcib2: <ACPI PCI-PCI bridge> at device 16.0 on pci0
> pci2: <ACPI PCI bus> on pcib2
> ohci0: <OHCI (generic) USB controller> mem 0xf4100000-0xf4100fff irq 19
> at device 0.0 on pci2
> ohci0: [GIANT-LOCKED]
> usb0: OHCI version 1.0, legacy support
> usb0: SMM does not respond, resetting
> usb0: <OHCI (generic) USB controller> on ohci0
> usb0: USB revision 1.0
> uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub0: 4 ports with 4 removable, self powered
> pci2: <display, VGA> at device 7.0 (no driver attached)
> xl0: <3Com 3c980C Fast Etherlink XL> port 0x2400-0x247f mem
> 0xf4102000-0xf410207f irq 18 at device 8.0 on pci2
> miibus0: <MII bus> on xl0
> ukphy0: <Generic IEEE 802.3u media interface> on miibus0
> ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> xl0: Ethernet address: 00:e0:81:22:2e:c4
> xl1: <3Com 3c980C Fast Etherlink XL> port 0x2480-0x24ff mem
> 0xf4102400-0xf410247f irq 19 at device 9.0 on pci2
> miibus1: <MII bus> on xl1
> ukphy1: <Generic IEEE 802.3u media interface> on miibus1
> ukphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> xl1: Ethernet address: 00:e0:81:22:2e:c5
> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
> fdc0: does not respond
> device_attach: fdc0 attach returned 6
> fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
> fdc0: does not respond
> device_attach: fdc0 attach returned 6
> pmtimer0 on isa0
> orm0: <ISA Option ROMs> at iomem
> 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc8fff,0xe0000-0xe3fff on isa0
> ppc0: parallel port not found.
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
> sio0: type 8250 or not responding
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> Timecounters tick every 1.000 msec
> hptrr: no controller detected.
> Waiting 5 seconds for SCSI devices to settle
> ad0: 476940MB <WDC WD5000AAKB-00UKA0 07.01N01> at ata0-master UDMA100
> amr0: delete logical drives supported by controller
> amrd0: <LSILogic MegaRAID logical drive> on amr0
> amrd0: 139900MB (286515200 sectors) RAID 1 (optimal)
> SMP: AP CPU #1 Launched!
> Trying to mount root from ufs:/dev/amrd0s1a
> 
> ---
> Danny Fullerton
> Mantor Organization
> 
> Paul Missman wrote:
> >
> > Danny,
> >
> > I don't know what the bug is, but it does exist.
> >
> > I have an IBM x3455 with 2 Opteron dual core processors.  Under heavy
> > loads it crashes.  As a step in debugging, I unplugged one of the
> > processors, and the problem went away.  I switched to Centos version
> > 4, and it operates perfectly.
> >
> > In addition to FreeBSD,  the problem also exists in Fedora Core.
> >
> > Of the OSes I tested, only Redhat and Centos worked correctly on the
> > x3455.
> >
> > I didn't try Windows, so I can't say whether or not it operates
> > properly on this system.
> >
> > Unfortunately, that is all I know about the issue.
> >
> > Paul Missman
> >
> >
> > ----- Original Message ----- From: "Danny Fullerton" <northox at mantor.org>
> > To: <freebsd-smp at freebsd.org>
> > Sent: Tuesday, March 04, 2008 9:05 PM
> > Subject: Dual AMD MP unstable under heavy load when smp is active
> >
> >
> >> Hi guys,
> >>
> >> I been having quite some trouble finding a problem whom seem to be
> >> related with SMP on one of my production server.
> >>
> >> The problem is not easily reproducible but the best way I found was to
> >> fire up "make buildworld" while having some other things going on
> >> (mysql, apache, bind, jails, etc). When SMP is active, the compile will
> >> end up with a segfault or, quite rarely, end up with a crash. I recently
> >> configure the crash device but still was unable to recreate a full
> >> system crash.
> >>
> >> At first, I thought it was related to the memory so I done some test and
> >> changed most DIMM but ultimately, the problem was sill there. To pin
> >> point the problem, I first tried to add options to the GENERIC kernel
> >> witch I found to be stable. That's how I found that it was related to
> >> SMP. I then tried mixing some other thing like reducing the driver in
> >> the kernel to the minimum I could for different reason. One of them is
> >> that the motherboard is a "Tyan thunder K7X"
> >> (http://www.tyan.com/archive/products/html/thunderk7x.html) and it has
> >> an onbord adaptec SCSI controller which I don't use. Since the driver
> >> used for this adapter is not MP safe, I tried disabling it via the BIOS
> >> and/or by disabling the driver in the kernel but it had no effect. The
> >> actual SCSI adapter in used is the Dell 4/DC (LSILogic MegaRAID) you can
> >> see in the dmesg.
> >>
> >> Now I have no clue on how I could further debug this problem.
> >>
> >> dmesg from generic kernel:
> >>
> >> Copyright (c) 1992-2008 The FreeBSD Project.
> >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> >>        The Regents of the University of California. All rights reserved.
> >> FreeBSD is a registered trademark of The FreeBSD Foundation.
> >> FreeBSD 6.3-RELEASE-p1 #0: Wed Feb 27 07:56:51 EST 2008
> >>    root at megatron.mantor.org:/usr/obj/usr/src/sys/GENERIC
> >> ACPI APIC Table: <PTLTD          APIC  >
> >> Timecounter "i8254" frequency 1193182 Hz quality 0
> >> CPU: AMD Athlon(tm) MP 2200+ (1800.07-MHz 686-class CPU)
> >>  Origin = "AuthenticAMD"  Id = 0x680  Stepping = 0
> >>
> >> Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
> >>
> >>  AMD Features=0xc0480800<SYSCALL,MP,MMX+,3DNow!+,3DNow!>
> >> real memory  = 3220701184 (3071 MB)
> >> avail memory = 3150741504 (3004 MB)
> >> MADT: Forcing active-low polarity and level trigger for SCI
> >> ioapic0 <Version 1.1> irqs 0-23 on motherboard
> >> kbd1 at kbdmux0
> >> ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
> >> RF5413)
> >> hptrr: HPT RocketRAID controller driver v1.1 (Feb 27 2008 07:56:28)
> >> acpi0: <PTLTD   RSDT> on motherboard
> >> acpi0: Power Button (fixed)
> >> acpi0: Sleep Button (fixed)
> >> Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
> >> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0
> >> cpu0: <ACPI CPU> on acpi0
> >> acpi_button0: <Power Button> on acpi0
> >> pcib0: <ACPI Host-PCI bridge> port
> >> 0xcf8-0xcff,0x8000-0x807f,0x8080-0x80ff iomem 0xd8000-0xdbfff on acpi0
> >> pci0: <ACPI PCI bus> on pcib0
> >> agp0: <AMD 762 host to AGP bridge> port 0x1810-0x1813 mem
> >> 0xf8000000-0xfbffffff,0xf6210000-0xf6210fff at device 0.0 on pci0
> >> pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
> >> pci1: <ACPI PCI bus> on pcib1
> >> isab0: <PCI-ISA bridge> at device 7.0 on pci0
> >> isa0: <ISA bus> on isab0
> >> atapci0: <AMD 768 UDMA100 controller> port
> >> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 7.1 on pci0
> >> ata0: <ATA channel 0> on atapci0
> >> ata1: <ATA channel 1> on atapci0
> >> pci0: <bridge> at device 7.3 (no driver attached)
> >> amr0: <LSILogic MegaRAID 1.53> mem 0xf6200000-0xf620ffff irq 20 at
> >> device 8.0 on pci0
> >> amr0: delete logical drives supported by controller
> >> amr0: <LSILogic PERC 4/DC> Firmware 350O, BIOS 1.09, 128MB RAM
> >> ahc0: <Adaptec aic7899 Ultra160 SCSI adapter> port 0x1000-0x10ff mem
> >> 0xf4000000-0xf4000fff irq 20 at device 10.0 on pci0
> >> ahc0: [GIANT-LOCKED]
> >> aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
> >> ahc1: <Adaptec aic7899 Ultra160 SCSI adapter> port 0x1400-0x14ff mem
> >> 0xf4001000-0xf4001fff irq 21 at device 10.1 on pci0
> >> ahc1: [GIANT-LOCKED]
> >> aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
> >> pcib2: <ACPI PCI-PCI bridge> at device 16.0 on pci0
> >> pci2: <ACPI PCI bus> on pcib2
> >> ohci0: <OHCI (generic) USB controller> mem 0xf4100000-0xf4100fff irq 19
> >> at device 0.0 on pci2
> >> ohci0: [GIANT-LOCKED]
> >> usb0: OHCI version 1.0, legacy support
> >> usb0: SMM does not respond, resetting
> >> usb0: <OHCI (generic) USB controller> on ohci0
> >> usb0: USB revision 1.0
> >> uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> >> uhub0: 4 ports with 4 removable, self powered
> >> pci2: <display, VGA> at device 7.0 (no driver attached)
> >> xl0: <3Com 3c980C Fast Etherlink XL> port 0x2400-0x247f mem
> >> 0xf4102000-0xf410207f irq 18 at device 8.0 on pci2
> >> miibus0: <MII bus> on xl0
> >> ukphy0: <Generic IEEE 802.3u media interface> on miibus0
> >> ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> >> xl0: Ethernet address: 00:e0:81:22:2e:c4
> >> xl1: <3Com 3c980C Fast Etherlink XL> port 0x2480-0x24ff mem
> >> 0xf4102400-0xf410247f irq 19 at device 9.0 on pci2
> >> miibus1: <MII bus> on xl1
> >> ukphy1: <Generic IEEE 802.3u media interface> on miibus1
> >> ukphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> >> xl1: Ethernet address: 00:e0:81:22:2e:c5
> >> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> >> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> >> kbd0 at atkbd0
> >> atkbd0: [GIANT-LOCKED]
> >> fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on
> >> acpi0
> >> fdc0: does not respond
> >> device_attach: fdc0 attach returned 6
> >> fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on
> >> acpi0
> >> fdc0: does not respond
> >> device_attach: fdc0 attach returned 6
> >> pmtimer0 on isa0
> >> orm0: <ISA Option ROMs> at iomem
> >> 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc8fff,0xe0000-0xe3fff on isa0
> >> ppc0: parallel port not found.
> >> sc0: <System console> at flags 0x100 on isa0
> >> sc0: VGA <16 virtual consoles, flags=0x300>
> >> sio0: configured irq 4 not in bitmap of probed irqs 0
> >> sio0: port may not be enabled
> >> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
> >> sio0: type 8250 or not responding
> >> sio1: configured irq 3 not in bitmap of probed irqs 0
> >> sio1: port may not be enabled
> >> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on
> >> isa0
> >> Timecounter "TSC" frequency 1800073530 Hz quality 800
> >> Timecounters tick every 1.000 msec
> >> hptrr: no controller detected.
> >> Waiting 5 seconds for SCSI devices to settle
> >> ad0: 476940MB <WDC WD5000AAKB-00UKA0 07.01N01> at ata0-master UDMA100
> >> amr0: delete logical drives supported by controller
> >> amrd0: <LSILogic MegaRAID logical drive> on amr0
> >> amrd0: 139900MB (286515200 sectors) RAID 1 (optimal)
> >> Trying to mount root from ufs:/dev/amrd0s1a
> >>
> >> kldstat:
> >>
> >> Id Refs Address    Size     Name
> >> 1   10 0xc0400000 7a05b0   kernel
> >> 2    1 0xc0ba1000 5c304    acpi.ko
> >> 3    1 0xc8093000 3000     fdescfs.ko
> >> 4    1 0xc8106000 3000     pflog.ko
> >> 5    1 0xc8109000 2d000    pf.ko
> >> 6    1 0xc817b000 19000    linux.ko
> >>
> >> If you have any idea or you need more information to diagnosis the
> >> problem please let me known.
> >>
> >> regards,
> >>
> >> ---
> >> Danny Fullerton
> >> Mantor Organization
> >> _______________________________________________
> >> freebsd-smp at freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-smp
> >> To unsubscribe, send any mail to "freebsd-smp-unsubscribe at freebsd.org"
> >>
> >>
> >> -- 
> >> No virus found in this incoming message.
> >> Checked by AVG Free Edition.
> >> Version: 7.5.516 / Virus Database: 269.21.4/1310 - Release Date:
> >> 3/4/2008 8:35 AM
> >>
> >>
> >
> > _______________________________________________
> > freebsd-smp at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-smp
> > To unsubscribe, send any mail to "freebsd-smp-unsubscribe at freebsd.org"
> 
> _______________________________________________
> freebsd-smp at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-smp
> To unsubscribe, send any mail to "freebsd-smp-unsubscribe at freebsd.org"


More information about the freebsd-smp mailing list