8.1 amd64 lockup (maybe zfs or disk related)

Sat Feb 12 04:18:18 UTC 2011

On Fri, Feb 11, 2011 at 07:24:27PM -0800, Greg Bonett wrote:
> Thanks for all the help. I've learned some new things, but haven't fixed
> the problem yet.
> 
> > 1) Re-enable both CPU cores; I can't see this being responsible for the
> > problem.  I do understand the concern over added power draw, but see
> > recommendation (4a) below.
> 
> I re-enabled all cores but experienced a lockup while running zpool
> scrub.  I was able to run scrub twice with 4 of 6 cores enabled without
> lockup.  Also, when lockup occurs I'm not able to access the debugger
> with ctrl-alt-esc.  Just to keep things straight, since I'm running
> geli, more cores means more io throughput during a scrub.

Okay, and what happens if you disable two cores and re-install the disks
you removed?  Does the system lock up during "zpool scrub" then?

Basically I'm trying to figure out if the problem is related to having 6
cores enabled, or if it's related to having too many disks in use.  If
it happens in both cases (4 of 6 cores w/ all disks attached, and 6 of 6
cores w/ only some disks attached), then it's probably a motherboard or
PSU issue like suspected.

> If I'm not able to use the kernel debugger to diagnose this problem,
> should I disable it?  Could it be a security risk?

Let me explain why I advocated adding the debugger to your kernel.
Basically if the machine "locks up" you are supposed to try and press
Ctrl-Alt-Esc to see if you drop to a db> prompt.  If so, the kernel is
still alive/working, and the machine actually isn't "hard locked".

The debugger is not a security risk.  There are only 3 ways (sans serial
console, which isn't in use on your system so it doesn't apply) I know
of to induce the debugger: 1) execute "sysctl debug.kdb.enter=1" as
root, 2) physically press Ctrl-Alt-Esc on the VGA console, 3) crash the
machine.

> > 3b) You've stated you're using one of your drives on an eSATA cable.  If
> > you are using a SATA-to-eSATA adapter bracket[1][2], please stop
> > immediately and use a native eSATA port instead.
> > 
> > Adapter brackets are known to cause all sorts of problems that appear as
> > bizarre/strange failures (xxx_DMAxx errors are quite common in this
> > situation), not to mention with all the internal cabling and external
> > cabling, a lot of the time people exceed the maximum SATA cable length
> > without even realising it -- it's the entire length from the SATA port
> > on your motherboard, to and through the adapter (good luck figuring out
> > how much wire is used there, to the end of the eSATA cable.  Native
> > eSATA removes use of the shoddy adapters and also extends the maximum
> > cable length (from 1 metre to 2 metres), plus provides the proper amount
> > of power for eSATA devices (yes this matters!).  Wikipedia has
> > details[3].
> > 
> > Silicon Image and others do make chips that offer both internal SATA and
> > an eSATA port on the same controller.  Given your number of disks, you
> > might have to invest in multiple controllers.
> 
> My motherboard has an eSATA port and that's what I'm using (not an
> extension bracket)  Do you still recommend against it?  I figured one
> fewer drive in the case would reduce the load on my PSU.

If the eSATA port is on the motherboard backplane (e.g. a port that's
soldered to the motherboard), then you're fine.  Be aware that the eSATA
port may be connected to the JMicron controller, however, which I've
already said is of questionable quality to begin with.  :-)

Is your eSATA enclosure/hard disk powered off of the eSATA cable, or are
you using an AC adapter with it?  That will determine use of additional
load on the PSU.

> > 4a) Purchase a Kill-a-Watt meter and measure exactly how much power your
> > entire PC draws, including on power-on (it will be a lot higher during
> > power-on than during idle/use, as drives spinning up draw lots of amps).
> > I strongly recommend the Kill-a-Watt P4600 model[4] over the P4400 model.
> > Based on the wattage and amperage results, you should be able to
> > determine if you're nearing the maximum draw of your PSU.
> 
> Kill-a-Watt meter arrived today.  It looks like during boot it's not
> exceeding 200 watts.  During a zpool scrub it gets up to ~255 watts
> (with all cores enabled).  So I don't think the problem is gross power
> consumption. 

And this is with all 6 cores enabled, AND all disks attached, during a
"zpool scrub"?  If so, I agree the PSU load is not a problem.  Voltages
and so on could be a problem, but FreeBSD's hardware monitoring support
is sub-par when it comes to any system made after ~2002, so you won't
get very far monitoring such in the OS.  I speak the truth given that I
maintain the Supermicro-specific hardware monitoring software
(ports/sysutils/bsdhwmon).  :-)

System BIOSes provide hardware monitoring indicators (voltages, etc.),
but voltages will slightly change/shift when running under an OS vs.
looking at them in the BIOS.  Viewing the BIOS attributes would be
worthwhile to verify if, say, your 12V line is running at 10V or
something absurd.  Please be aware there will always be some variance on
the voltages (e.g. don't expect a -5V line to return -5.000V exactly.
Deviation should be expected)

> > 4b) However, even if you're way under-draw (say, 400W), the draw may not
> > be the problem but instead the maximum amount of power/amperage/whatever
> > a single physical power cable can provide.  I imagine to some degree it
> > depends on the gauge of wire being used; excessive use of Y-splitters to
> > provide more power connectors than the physical cable provides means
> > that you might be drawing too much across the existing gauge of cable
> > that runs to the PSU.  I have seen setups where people have 6 hard disks
> > coming off of a single power cable (with Y-splitters and molex-to-SATA
> > power adapters) and have their drives randomly drop off the bus.  Please
> > don't do this.
> 
> Yes this seems like it could be a problem.  I'll shutdown and figure out
> which drives are connected to which cables.  Maybe with some rearranging
> I can even out the load.  Even if I have a bunch of drives on a single
> cable, would a voltage drop on one cable filled with drives be enough to
> lockup the machine?  It seems like the motherboard power would be
> unaffected.

I cannot comment on this -- EE-related things (including power,
voltages, amperage, etc.) are outside of my skill set, aside from "don't
draw too many amps or the rack will blow".  :-)  As a SA, my opinion is
that "weird electrical issues" can cause completely random things to
happen when it comes to hardware.  As stated (I think) elsewhere in my
mails, the problem with issues like this is that you have to replace one
piece at a time, wasting time + money in the process, until you figure
out what's broken.

I can't keep spending time trying to diagnose this issue for you though,
I think at this point I've given you all the tips needed to investigate
the root cause yourself, but it will take time/effort on your part.  If
at the end of the adventure you've replaced all the parts and it still
happens, then something *very* weird is going on (could be a motherboard
defect, or even a CPU defect).

I forget if I mentioned this or not, but story time: I had a box that
kept "randomly" locking up.  Turns out there was a small shard of metal
that had been shaved off the inside of the chassis somehow, and was
blowing around inside, shorting stuff out.

A different story involves a workstation I had which was behaving oddly
(randomly).  Once I unmounted the motherboard, I found I had
accidentally left a loose motherboard mounting screw in there (!), which
was probably pressed up against the motherboard and the case, likely
shorting something out.  Why it wasn't a consistent failure I don't
know, I imagine it greatly depends on what the screw was pushed up
against, and the circuitry used near/around that area.

Low-level EE/hardware is seriously "black box magic voodoo" to me, while
electrical engineers laugh and argue that *software* is black magic.
But I disagree -- there's no "magic smoke" in software, but there *are*
hardware gremlins.  ;-)

> > After reviewing your SMART stats on the drive, I agree -- it looks
> > perfectly healthy (for a Seagate disk).  Nothing wrong there.
> > 
> > > > > calcru: runtime went backwards from 82 usec to 70 usec for pid 20 (flowcleaner)
> > > > > calcru: runtime went backwards from 363 usec to 317 usec for pid 8 (pagedaemon)
> > > > > calcru: runtime went backwards from 111 usec to 95 usec for pid 7 (xpt_thrd)
> > > > > calcru: runtime went backwards from 1892 usec to 1629 usec for pid 1 (init)
> > > > > calcru: runtime went backwards from 6786 usec to 6591 usec for pid 0 (kernel)
> > > > 
> > > > This is a problem that has plagued FreeBSD for some time.  It's usually
> > > > caused by EIST (est) being used, but that's on Intel platforms.  AMD has
> > > > something similar called Cool'n'Quiet (see cpufreq(4) man page).  Are
> > > > you running powerd(8) on this system?  If so, try disabling that and see
> > > > if these go away.
> > > 
> > > sadly, I don't know if I'm running powerd. 
> > > ps aux | grep power gives nothing, so no I guess...
> > > as far as I can tell, this error is the least of my problems right now,
> > > but i would like to fix it.
> > 
> > Yes that's an accurate ps/grep to use; powerd_enable="yes" in
> > /etc/rc.conf is how you make use of it.
> 
> Is this recommended for desktop machines?  

Because you're trying to diagnose hardware problems or similar oddities,
please do not enable powerd at this moment.  Otherwise, if your system
is rock solid, yes, use of powerd on machines is recommended, assuming
you'd like the CPU to throttle down in frequency/speed during idle.

Please keep reading for what powerd does/how it works (on a general
level).  The below sysctl output will help shed some light.

Again, just keep in mind the below is *informative only* -- you should
not go messing with this given your system instability issues.  Don't
add more complexity to the mix.  :-)

> > Could you provide output from "sysctl -a | grep freq"?  That might help
> > shed some light on the above errors as well, but as I said, I'm not
> > familiar with AMD systems.
> > 
> 
> $ sysctl -a | grep freq
> kern.acct_chkfreq: 15
> kern.timecounter.tc.i8254.frequency: 1193182
> kern.timecounter.tc.ACPI-fast.frequency: 3579545
> kern.timecounter.tc.HPET.frequency: 14318180
> kern.timecounter.tc.TSC.frequency: 3491654411
> net.inet.sctp.sack_freq: 2
> debug.cpufreq.verbose: 0
> debug.cpufreq.lowest: 0
> machdep.acpi_timer_freq: 3579545
> machdep.tsc_freq: 3491654411
> machdep.i8254_freq: 1193182
> dev.cpu.0.freq: 3000
> dev.cpu.0.freq_levels: 3000/19507 2625/17068 2300/14500 2012/12687
> 1725/10875 1600/10535 1400/9218 1200/7901 1000/6584 800/6345 700/5551
> 600/4758 500/3965 400/3172 300/2379 200/1586 100/793
> dev.acpi_throttle.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1
> 5000/-1 3750/-1 2500/-1 1250/-1
> dev.cpufreq.0.%driver: cpufreq
> dev.cpufreq.0.%parent: cpu0
> dev.hwpstate.0.freq_settings: 3000/19507 2300/14500 1600/10535 800/6345

The simple version of what powerd does is that it adjusts the CPU clock
frequency (speed) when the machine is idle, using less power, and
increases it when the machine needs more processing power.  The polling
interval defaults to 250ms by default, and the software works in
"steps".

The current clock speed of the CPU (all cores) is shown by
dev.cpu.0.freq.  In your case above, that's 3000MHz (3GHz).  The
processor supports lots of clock frequencies -- which are shown in
dev.cpu.0.freq_levels.  Ignore the value after the "/".

If powerd was in use, your system would run at 3GHz until powerd
started.  Then, assuming the system wasn't under load, it would
gradually decrease the clock speed: 3000MHz -> 2625MHz -> 2300MHz ->
2012MHz -> etc... (see sysctl above).  When the system needed to do more
CPU processing, the clock speed would increase in those increments,
then settle back down again when idling.  You get the idea.

powerd supports setting what the minimum and maximum frequencies are,
so that you don't "throttle down" or "throttle up" too much, depending
on what you want.  The -m and -M flags control this.  E.g. "-m 1600"
would result in 3000 -> 2625 -> 2300 -> 2012 -> 1725 -> 1600, where it
would then stay when idle.  On load, it would ramp back up gradually.

However, there are multiple types of throttling (as you can see from
reading the cpufreq(4) man page, labelled SUPPORTED DRIVERS).  You would
basically want to ensure you use the AMD-specific throttling method and
not ACPI throttling.  Because I'm not familiar with AMD systems, I can't
comment on how to do this, but for Intel systems it's as simple as
setting these in /boot/loader.conf:

# Enable use of P-state CPU frequency throttling.
# http://wiki.freebsd.org/TuningPowerConsumption
hint.p4tcc.0.disabled="1"
hint.acpi_throttle.0.disabled="1"

I imagine on an AMD system you'd need the last tunable, in addition to
some other tunable (probably not p4tcc; not sure what AMD has).

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |