Xserve G5 keeps shutting down

Paul Mather paul at gromit.dlib.vt.edu
Mon Jun 20 22:19:34 UTC 2011


I'm running FreeBSD/powerpc64 -CURRENT on an Xserve G5.  With a recent kernel, the system will not stay up for more than a few hours at a time. :-(

I have no idea why the machine is shutting off.  There is no panic or crash dump and there is no indication in the logs of anything awry.  The system just powers down.  The times this has happened when I have been there have not indicated anything stressing the system (like all fans racing madly) and oftentimes the system has been relatively idle.  (Oddly, it never appears to my knowledge to have shut down when doing sometime potentially taxing, such as a make -j5 buildworld or the likes.)

The main thing I have noticed since building this new kernel is that the fans are now controlled automatically, i.e., there is now no need for the tickle-the-fan-controller cron job of yore, meaning the fans won't race when in single user mode (e.g., during an installworld).

Whereas the CPU fan RPM are in the range I am used to before, the PWM-controlled fans' RPM is high compared to the range.  For example:

dev.fcu.0.fans.sys_ctrlr_fan.minpwm: 40
dev.fcu.0.fans.sys_ctrlr_fan.maxpwm: 100
dev.fcu.0.fans.sys_ctrlr_fan.pwm: 49
dev.fcu.0.fans.sys_ctrlr_fan.rpm: 10112
dev.fcu.0.fans.pci_fan.minpwm: 40
dev.fcu.0.fans.pci_fan.maxpwm: 100
dev.fcu.0.fans.pci_fan.pwm: 48
dev.fcu.0.fans.pci_fan.rpm: 10048


When I was setting the RPMs manually, it seemed that ~6000 RPM was a nice, comfortable setting; 10000+ RPM was definitely high.  With the PWM automatic control, the fans appear to spin fast even at the low end of the PWM range (e.g., see above).  I corresponded with Andreas Tobler several months ago about the fcu driver and it seemed like the scaling didn't work universally across different hardware models (e.g., what worked well on a PowerMac G5 didn't work well on an Xserve G5).

I notice a "machdep.manage_fans" sysctl.  By setting this to "0" does the behaviour revert to the old one, i.e., where you can set the RPMs explicitly?  Maybe I'll have more luck with the previous method?

Has anyone else noticed problems with their Xserve G5?  I'm not ruling out a hardware problem, but up until now this system has been rock solid.

Cheers,

Paul.

PS: Here is a boot dmesg:

Kernel entry at 0x103250 ...
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2011 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.0-CURRENT #0: Mon Jun 20 14:08:30 EDT 2011
    paul at backup.tower.lib.vt.edu:/usr/obj/usr/src/sys/CHUMBY_DEBUG powerpc
cpu0: IBM PowerPC 970FX revision 3.0, 2300.74 MHz
cpu0: Features dc000000<PPC32,PPC64,ALTIVEC,FPU,MMU>
cpu0: HID0 511081<NAP,DPM,NHR,TBEN,ENATTN>
real memory  = 4276617216 (4078 MB)
avail memory = 4040163328 (3853 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
cpu0: dev=ff88ed18 (BSP)
cpu1: dev=ff890150
kbd0 at kbdmux0
nexus0: <Open Firmware Nexus device>
cpulist0: <Open Firmware CPU Group> on nexus0
cpu0: <Open Firmware CPU> on cpulist0
pcr0: <PPC 970 Power Control Register> on cpu0
pcr0: No power mode data in device tree!
device_attach: pcr0 attach returned 6
cpu1: <Open Firmware CPU> on cpulist0
pcr1: <PPC 970 Power Control Register> on cpu1
pcr1: No power mode data in device tree!
device_attach: pcr1 attach returned 6
unin0: <Apple UniNorth System Controller> on nexus0
unin0: Version 53
iichb0: <Keywest I2C controller> mem 0xf8001000-0xf8001fff irq 0 on unin0
iicbus0: <OFW I2C bus> on iichb0
iicbus0: <unknown card> at addr 0xd4
max66900: <Temp-Monitor MAX6690> at addr 0x98 on iicbus0
fcu0: <Apple Fan Control Unit> at addr 0x15e on iicbus0
iicbus0: <unknown card> at addr 0x30
ad74170: <Supply-Monitor AD7417> at addr 0x58 on iicbus0
ad74170: 5 sensors detected.
iicbus0: <unknown card> at addr 0xa0
iicbus0: <unknown card> at addr 0x32
ad74171: <Supply-Monitor AD7417> at addr 0x5a on iicbus0
ad74171: 5 sensors detected.
iicbus0: <unknown card> at addr 0xa2
iicbus0: <unknown card> at addr 0x1c0
htpic0: <OpenPIC Interrupt Controller> mem 0xf8040000-0xf807ffff irq 184 on unin0
pcib0: <Apple U3 Host-AGP bridge> on nexus0
pci0: <OFW PCI bus> on pcib0
pcib1: <IBM CPC9X5 HyperTransport Tunnel> on nexus0
pcib1: 4 HT IRQs on device 1.0
pcib1: 4 HT IRQs on device 2.0
pcib1: 86 HT IRQs on device 3.0
pci1: <OFW PCI bus> on pcib1
pcib2: <OFW PCI-PCI bridge> at device 1.0 on pci1
pci2: <OFW PCI bus> on pcib2
pcib3: <OFW PCI-PCI bridge> at device 2.0 on pci1
pci3: <OFW PCI bus> on pcib3
bge0: <Broadcom BCM5704 A3, ASIC rev. 0x002003> mem 0x90030000-0x9003ffff,0x90020000-0x9002ffff irq 182 at device 4.0 on pci3
bge0: CHIP ID 0x00002003; ASIC REV 0x02; CHIP REV 0x20; PCI-X
miibus0: <MII bus> on bge0
brgphy0: <BCM5704 1000BASE-T media interface> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge0: Ethernet address: 00:0d:93:9d:40:c0
bge1: <Broadcom BCM5704 A3, ASIC rev. 0x002003> mem 0x90010000-0x9001ffff,0x90000000-0x9000ffff irq 185 at device 4.1 on pci3
bge1: CHIP ID 0x00002003; ASIC REV 0x02; CHIP REV 0x20; PCI-X
miibus1: <MII bus> on bge1
brgphy1: <BCM5704 1000BASE-T media interface> PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge1: Ethernet address: 00:0d:93:9d:40:c1
pcib4: <OFW PCI-PCI bridge> at device 3.0 on pci1
pci4: <OFW PCI bus> on pcib4
macio0: <K2 KeyLargo I/O Controller> mem 0x80000000-0x8007ffff at device 7.0 on pci4
openpic0: <OpenPIC Interrupt Controller> mem 0x40000-0x7ffff on macio0
macgpio0: <MacIO GPIO Controller> mem 0x50-0x8a on macio0
scc0: <Zilog Z8530 dual channel SCC> mem 0x13000-0x13fff,0x8400-0x84ff,0x8500-0x85ff,0x8600-0x86ff,0x8700-0x87ff irq 150,133,134,151,135,136 on macio0
uart0: <z8530, channel A> on scc0
uart0: console (57600,n,8,1)
uart1: <z8530, channel B> on scc0
iichb1: <Keywest I2C controller> mem 0x18000-0x18fff irq 154 on macio0
iicbus1: <OFW I2C bus> on iichb1
iicbus1: <unknown card> at addr 0x5a
iicbus1: <unknown card> at addr 0x5c
ds17750: <Temp-Monitor DS1775> at addr 0x90 on iicbus1
iicbus1: <unknown card> at addr 0x94
iicbus1: <unknown card> at addr 0x1c0
pmu0: <Apple PMU99 Controller> mem 0x16000-0x17fff irq 153 on macio0
pcib5: <OFW PCI-PCI bridge> at device 4.0 on pci1
pci5: <OFW PCI bus> on pcib5
ohci0: <NEC uPD 9210 USB controller> mem 0x80102000-0x80102fff irq 191 at device 11.0 on pci5
usbus0: <NEC uPD 9210 USB controller> on ohci0
ohci1: <NEC uPD 9210 USB controller> mem 0x80101000-0x80101fff irq 191 at device 11.1 on pci5
usbus1: <NEC uPD 9210 USB controller> on ohci1
ehci0: <NEC uPD 720100 USB 2.0 controller> mem 0x80100000-0x801000ff irq 191 at device 11.2 on pci5
usbus2: EHCI version 1.0
usbus2: <NEC uPD 720100 USB 2.0 controller> on ehci0
pcib6: <OFW PCI-PCI bridge> at device 5.0 on pci1
pci6: <OFW PCI bus> on pcib6
ata0: <K2 Kauai ATA Controller> mem 0x80204000-0x80207fff irq 167 at device 13.0 on pci6
fwohci0: <1394 Open Host Controller Interface> mem 0x80200000-0x80200fff irq 168 at device 14.0 on pci6
fwohci0: OHCI version 1.0 (ROM=0)
fwohci0: No. of Isochronous channels is 8.
fwohci0: EUI64 00:0d:93:ff:fe:c5:c9:fa
fwohci0: invalid speed 7 (fixed to 3).
fwohci0: Phy 1394a available S800, 3 ports.
fwohci0: Link S800, max_rec 4096 bytes.
firewire0: <IEEE1394(FireWire) bus> on fwohci0
fwe0: <Ethernet over FireWire> on firewire0
if_fwe0: Fake Ethernet address: 02:0d:93:c5:c9:fa
fwe0: Ethernet address: 02:0d:93:c5:c9:fa
sbp0: <SBP-2/SCSI over FireWire> on firewire0
fwohci0: Initiate bus reset
fwohci0: fwohci_intr_core: BUS reset
fwohci0: fwohci_intr_core: node_id=0x00000001, SelfID Count=2, CYCLEMASTER mode
pcib7: <OFW PCI-PCI bridge> at device 6.0 on pci1
pci7: <OFW PCI bus> on pcib7
gem0: <Apple K2 GMAC Ethernet> mem 0x80400000-0x805fffff at device 15.0 on pci7
gem0: failed to allocate resources
device_attach: gem0 attach returned 6
pcib8: <OFW PCI-PCI bridge> at device 7.0 on pci1
pci8: <OFW PCI bus> on pcib8
atapci0: <ServerWorks K2 SATA150 controller> mem 0x80600000-0x80601fff irq 128 at device 12.0 on pci8
ata2: <ATA channel 0> on atapci0
ata3: <ATA channel 1> on atapci0
ata4: <ATA channel 2> on atapci0
ata5: <ATA channel 3> on atapci0
atapci1: <ServerWorks K2 SATA150 controller> at device 12.1 on pci8
pcib1: failed to reserve resource for atapci1
atapci1: 0x10 bytes of rid 0x20 res 4 failed (0, 0xffffffffffffffff).
atapci1: unable to map interrupt
device_attach: atapci1 attach returned 6
Timecounter "timebase" frequency 33333333 Hz quality 0
Event timer "decrementer" frequency 33333333 Hz quality 1000
Timecounters tick every 1.000 msec
max66900: 2 sensors detected.
firewire0: 2 nodes, maxhop <= 1 cable IRM irm(1)  (me) 
firewire0: bus manager 1 
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 480Mbps High Speed USB v2.0
ugen0.1: <NEC> at usbus0
uhub0: <NEC OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <NEC> at usbus1
uhub1: <NEC OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <NEC> at usbus2
uhub2: <NEC EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
fcu0: FCU initialized, RPM shift: 2
fcu0: 8 fans detected!
uhub1: 2 ports with 2 removable, self powered
uhub0: 3 ports with 3 removable, self powered
ada0 at ata2 bus 0 scbus2 target 0 lun 0
ada0: <Hitachi HDS722580VLSA80 V32BC6EA> ATA-6 SATA 1.x device
ada0: 150.000MB/s transfers (SATA 1.x, UDMA5, PIO 8192bytes)
ada0: 78533MB (160836480 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad0
ada1 at ata4 bus 0 scbus4 target 0 lun 0
ada1: <Hitachi HDS725050KLA360 K2ABC20A> ATA-7 SATA 1.x device
ada1: 150.000MB/s transfers (SATA 1.x, UDMA5, PIO 8192bytes)
ada1: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad1
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
SMP: AP CPU #1 launched
cd0 at ata0 bus 0 scbus0 target 0 lun 0
cd0: <MATSHITA CD-RW  CW-8123 CA14> Removable CD-ROM SCSI-0 device 
cd0: 33.300MB/s transfers (UDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
uhub2: 5 ports with 5 removable, self powered
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
xpt_action_default: CCB type 0xe not supported
Trying to mount root from ufs:/dev/ada0s3 [rw]...
WARNING: / was not properly dismounted
Enter full pathname of shell or RETURN for /bin/sh: 



More information about the freebsd-ppc mailing list