8.1 amd64 lockup (maybe zfs or disk related)
greg at bonett.org
Sat Feb 12 03:24:30 UTC 2011
Thanks for all the help. I've learned some new things, but haven't fixed
the problem yet.
> 1) Re-enable both CPU cores; I can't see this being responsible for the
> problem. I do understand the concern over added power draw, but see
> recommendation (4a) below.
I re-enabled all cores but experienced a lockup while running zpool
scrub. I was able to run scrub twice with 4 of 6 cores enabled without
lockup. Also, when lockup occurs I'm not able to access the debugger
with ctrl-alt-esc. Just to keep things straight, since I'm running
geli, more cores means more io throughput during a scrub.
If I'm not able to use the kernel debugger to diagnose this problem,
should I disable it? Could it be a security risk?
> 1) Disable the JMicron SATA controller entirely.
> 2) Disable the ATI IXP700/800 SATA controller entirely.
> 3a) Purchase a Silicon Image controller (one of the models I referenced
> in my previous mail). Many places sell them, but lots of online vendors
> hide or do not disclose what ASIC they're using for the controller. You
> might have to look at their Driver Downloads section to find out what
> actual chip is used.
This is on my todo list, but as of now I'm still running the controllers
on the motherboard. I should have the controller replaced by next week.
> 3b) You've stated you're using one of your drives on an eSATA cable. If
> you are using a SATA-to-eSATA adapter bracket, please stop
> immediately and use a native eSATA port instead.
> Adapter brackets are known to cause all sorts of problems that appear as
> bizarre/strange failures (xxx_DMAxx errors are quite common in this
> situation), not to mention with all the internal cabling and external
> cabling, a lot of the time people exceed the maximum SATA cable length
> without even realising it -- it's the entire length from the SATA port
> on your motherboard, to and through the adapter (good luck figuring out
> how much wire is used there, to the end of the eSATA cable. Native
> eSATA removes use of the shoddy adapters and also extends the maximum
> cable length (from 1 metre to 2 metres), plus provides the proper amount
> of power for eSATA devices (yes this matters!). Wikipedia has
> Silicon Image and others do make chips that offer both internal SATA and
> an eSATA port on the same controller. Given your number of disks, you
> might have to invest in multiple controllers.
My motherboard has an eSATA port and that's what I'm using (not an
extension bracket) Do you still recommend against it? I figured one
fewer drive in the case would reduce the load on my PSU.
> 4a) Purchase a Kill-a-Watt meter and measure exactly how much power your
> entire PC draws, including on power-on (it will be a lot higher during
> power-on than during idle/use, as drives spinning up draw lots of amps).
> I strongly recommend the Kill-a-Watt P4600 model over the P4400 model.
> Based on the wattage and amperage results, you should be able to
> determine if you're nearing the maximum draw of your PSU.
Kill-a-Watt meter arrived today. It looks like during boot it's not
exceeding 200 watts. During a zpool scrub it gets up to ~255 watts
(with all cores enabled). So I don't think the problem is gross power
> 4b) However, even if you're way under-draw (say, 400W), the draw may not
> be the problem but instead the maximum amount of power/amperage/whatever
> a single physical power cable can provide. I imagine to some degree it
> depends on the gauge of wire being used; excessive use of Y-splitters to
> provide more power connectors than the physical cable provides means
> that you might be drawing too much across the existing gauge of cable
> that runs to the PSU. I have seen setups where people have 6 hard disks
> coming off of a single power cable (with Y-splitters and molex-to-SATA
> power adapters) and have their drives randomly drop off the bus. Please
> don't do this.
Yes this seems like it could be a problem. I'll shutdown and figure out
which drives are connected to which cables. Maybe with some rearranging
I can even out the load. Even if I have a bunch of drives on a single
cable, would a voltage drop on one cable filled with drives be enough to
lockup the machine? It seems like the motherboard power would be
> A better solution might be to invest in a server-grade chassis, such as
> one from Supermicro, that offers a hot-swap SATA backplane. The
> backplane provides all the correct amounts of power to the maximum
> number of disks that can be connected to it. Here are some cases you
> can look at that. Also be aware that if you're already using a
> hot-swap backplane, most consumer-grade ones are complete junk and have
> been known to cause strange anomalies; it's always best in those
> situations to go straight from motherboard-to-drive or card-to-drive.
This would be nice, but it's not in my budget right now. I'll keep it
in mind for my next major upgrade.
> After reviewing your SMART stats on the drive, I agree -- it looks
> perfectly healthy (for a Seagate disk). Nothing wrong there.
> > > > calcru: runtime went backwards from 82 usec to 70 usec for pid 20 (flowcleaner)
> > > > calcru: runtime went backwards from 363 usec to 317 usec for pid 8 (pagedaemon)
> > > > calcru: runtime went backwards from 111 usec to 95 usec for pid 7 (xpt_thrd)
> > > > calcru: runtime went backwards from 1892 usec to 1629 usec for pid 1 (init)
> > > > calcru: runtime went backwards from 6786 usec to 6591 usec for pid 0 (kernel)
> > >
> > > This is a problem that has plagued FreeBSD for some time. It's usually
> > > caused by EIST (est) being used, but that's on Intel platforms. AMD has
> > > something similar called Cool'n'Quiet (see cpufreq(4) man page). Are
> > > you running powerd(8) on this system? If so, try disabling that and see
> > > if these go away.
> > sadly, I don't know if I'm running powerd.
> > ps aux | grep power gives nothing, so no I guess...
> > as far as I can tell, this error is the least of my problems right now,
> > but i would like to fix it.
> Yes that's an accurate ps/grep to use; powerd_enable="yes" in
> /etc/rc.conf is how you make use of it.
Is this recommended for desktop machines?
> Could you provide output from "sysctl -a | grep freq"? That might help
> shed some light on the above errors as well, but as I said, I'm not
> familiar with AMD systems.
$ sysctl -a | grep freq
dev.cpu.0.freq_levels: 3000/19507 2625/17068 2300/14500 2012/12687
1725/10875 1600/10535 1400/9218 1200/7901 1000/6584 800/6345 700/5551
600/4758 500/3965 400/3172 300/2379 200/1586 100/793
dev.acpi_throttle.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1
5000/-1 3750/-1 2500/-1 1250/-1
dev.hwpstate.0.freq_settings: 3000/19507 2300/14500 1600/10535 800/6345
More information about the freebsd-stable