System freezes up during long-running ZFS disk activity
dteske at FreeBSD.org
dteske at FreeBSD.org
Thu Feb 20 18:51:54 UTC 2014
> -----Original Message-----
> From: Christian Campbell [mailto:dcamp at alumni.ufl.edu]
> Sent: Wednesday, February 19, 2014 12:07 PM
> To: freebsd-questions at freebsd.org
> Subject: System freezes up during long-running ZFS disk activity
> I recently installed 9.2-RELEASE-p3 on a Dell Precision T5400. I'm using
> filesystem version: 5, ZFS storage pool version: features support (5000).
> pool was imported from a previous 9.2 box on which it worked without
> I don't know if my problem is ZFS-related, but my ZFS use is why I noticed
> I seem to be able to reproduce it reliably. Every so often, from minutes
> hours, my computer will freeze up while ZFS has been busy. This happens
> during a resilver, a scrub, and a long-running process reading millions of
> from the pool. When it freezes, all output and input
> freezes: tasks like zpool iostat -v 1 or top stop updating their output,
> the console or an ssh terminal over Ethernet. Pressing keys does not
> response.* Sometimes a freeze lasts minutes and then proceeds on its own.
> Sometimes it goes on for hours. An action that typically, but not always,
> is unplugging the USB keyboard -- the disk activity resumes immediately,
> any queued keyboard input immediately plays out whether on the console or
> over ssh. Lastly, my ssh terminal (PuTTY) will stay connected for hours
> freeze-up, *i.e.* the TCP circuit is not closed or timed out, as opposed
> closing pretty quickly after the server is powered off.
> In all cases, the system clock lags by the sum of the durations of the
> * During an initial resilver, I noticed that pressing a key such as Ctrl
on the USB
> keyboard would jog it, but pressing Ctrl or other keys doesn't jog my
> long-running IO activity. But in all cases, even when unplugging and
> the USB keyboard doesn't jog it, Ctrl-Alt-Del prompts an orderly shutdown.
> Debugging advise is very welcome!
I had this exact same problem on a Dell 1U F1DH server. I didn't send any
to the mailing lists, because I feared I was going crazy.
Of course, it's been 30 days since I had that problem... if I try to
it was... it was either the bad SATA port (which had loose soldering), or it
drive which said SATA port had fubar'd (putting that drive into another
the same thing happen in said new system).
So what I did was rsync all the data off that drive to another one (and yes,
I had to "jog" the system to get it to be responsive, in the same exact
describe above) it took a very _very_ long time. But... once I got off of
everything looked much much better.
I also found other ways to jog it were Alt+FN, and even the occasional ping
jog it too. It appeared to be interrupt driven in some way.
Might I suggest that you have a drive acting up in your pool.
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
More information about the freebsd-questions