System freezes up during long-running ZFS disk activity

dteske at dteske at
Thu Feb 20 18:51:54 UTC 2014

> -----Original Message-----
> From: Christian Campbell [mailto:dcamp at]
> Sent: Wednesday, February 19, 2014 12:07 PM
> To: freebsd-questions at
> Subject: System freezes up during long-running ZFS disk activity
> I recently installed 9.2-RELEASE-p3 on a Dell Precision T5400. I'm using
> filesystem version: 5, ZFS storage pool version: features support (5000).
> pool was imported from a previous 9.2 box on which it worked without
> I don't know if my problem is ZFS-related, but my ZFS use is why I noticed
it and
> I seem to be able to reproduce it reliably. Every so often, from minutes
> hours, my computer will freeze up while ZFS has been busy. This happens
> during a resilver, a scrub, and a long-running process reading millions of
> from the pool. When it freezes, all output and input
> freezes: tasks like zpool iostat -v 1 or top stop updating their output,
whether on
> the console or an ssh terminal over Ethernet. Pressing keys does not
garner a
> response.* Sometimes a freeze lasts minutes and then proceeds on its own.
> Sometimes it goes on for hours. An action that typically, but not always,
jogs it
> is unplugging the USB keyboard -- the disk activity resumes immediately,
> any queued keyboard input immediately plays out whether on the console or
> over ssh. Lastly, my ssh terminal (PuTTY) will stay connected for hours
during a
> freeze-up, *i.e.* the TCP circuit is not closed or timed out, as opposed
> closing pretty quickly after the server is powered off.
> In all cases, the system clock lags by the sum of the durations of the
> * During an initial resilver, I noticed that pressing a key such as Ctrl
on the USB
> keyboard would jog it, but pressing Ctrl or other keys doesn't jog my
process of
> long-running IO activity. But in all cases, even when unplugging and
> the USB keyboard doesn't jog it, Ctrl-Alt-Del prompts an orderly shutdown.
> Debugging advise is very welcome!
[Devin Teske] 

I had this exact same problem on a Dell 1U F1DH server. I didn't send any
to the mailing lists, because I feared I was going crazy.

Of course, it's been 30 days since I had that problem... if I try to
remember what
it was... it was either the bad SATA port (which had loose soldering), or it
was the
drive which said SATA port had fubar'd (putting that drive into another
system saw
the same thing happen in said new system).

So what I did was rsync all the data off that drive to another one (and yes,
I had to "jog" the system to get it to be responsive, in the same exact
situation you
describe above) it took a very _very_ long time. But... once I got off of
that drive
everything looked much much better.

I also found other ways to jog it were Alt+FN, and even the occasional ping
jog it too. It appeared to be interrupt driven in some way.

Might I suggest that you have a drive acting up in your pool.

The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

More information about the freebsd-questions mailing list