ZFS: Failed pool causes system to hang

Lawrence K. Chen, P.Eng. lkchen at ksu.edu
Mon Apr 8 22:22:51 UTC 2013



----- Original Message -----
> 
> > So, this thread seems to just stop....and can't see if it was
> > resolved or not.
> 
> It wasn't. Jeremy Chadwick was the only one who really responded, but
> besides confirming it wasn't specific to my hardware, there wasn't a
> lot
> he could do. He suggested I email some of the kernel folks directly
> and/or open a PR about it. (I'm planning on doing both, but haven't
> had
> time over the weekend).
> 
> 
> > Anyways, my input would be did you want long enough to see if the
> > system will boot before declaring it hung?
> 
> > I've had my system crash at bad times, which has resulted in the
> > appearance that the boot is hung...but its busy churning away....
> 
> > It seemed hung at trying to mount root
> 
> It might not have been clear from the back and forth, but my issue
> isn't
> a "boot hang" per se, but that "reboots also hang". The zfs subsystem
> hangs so thoroughly it blocks all io on all disks and prevents the
> reboot/halt/shutdown procedure from taking the machine down
> gracefully.
> Once I press the physical front-panel reboot button the machine comes
> up
> immediately (sans the offending pool). And yes I've waited over half
> an
> hour and it never recovers. My discussion with Jeremy indicated that
> the
> infinite wait is an "expected failure" in the sense that zfs would
> not
> be come back to life given the circumstances.
> 

So, you're not really waiting a long time....

Granted the first time it happened to me...I would wait 10-30 minutes depending on the Internet searching I was doing turned up....  But, then I just left it and watched some TV...when out of the corner of my eye, I saw it come back up (about 2.5 hours.)

Next time it happened....it seemed hung, but I left it....and it wasn't up the next morning...but I got notification later in the day that it had come back up.

Both times it involved zpools that wasn't my root pool.  Though both times it was an unexpected reboot while destroying a large dataset....first time was a 384G zvol with lots of snapshots (had been serving blocks up for iscsi).  second time was just a 1TB filesystem.  While I wasn't doing dedup on the zvol or filesystem, I was doing dedup in the pool....and I found that dedup does consider data in non-dedup enabled filesystems for dedup.  Since I had copied the filesystem to another in the same zpool with dedup on to see if dedup would help....it seemed to, until I removed the original filesystem.  So, not doing dedup in that pool anymore.

-- 
Who: Lawrence K. Chen, P.Eng. - W0LKC - Senior Unix Systems Administrator
For: Enterprise Server Technologies (EST) -- & SafeZone Ally
Snail: Computing and Telecommunications Services (CTS)
Kansas State University, 109 East Stadium, Manhattan, KS 66506-3102
Phone: (785) 532-4916 - Fax: (785) 532-3515 - Email: lkchen at ksu.edu
Web: http://www-personal.ksu.edu/~lkchen - Where: 11 Hale Library


More information about the freebsd-fs mailing list