zfs pool import hangs on [tx->tx_sync_done_cv]

Mark Martinec Mark.Martinec+freebsd at ijs.si
Sun Oct 12 20:40:41 UTC 2014


I made available an image copy (dd) of my 4 GiB partition
(compressed down to a 384 MiB file), holding one of my
partitions (a small bsd root) that can no longer be imported
into a 10.1-RC1 or 10.1-BETA3, as described in my first
posting (below):

   http://www.ijs.si/usr/mark/bsd/

I would appreciate if it can be confirmed that such pool
(one of several I have with this symptom) causes
'zpool import' to hang on 10.1 or 10-STABLE:

- download, xz -d sys1boot.img.xz
   # mdconfig -f sys1boot.img
   # zpool import sys1boot

... and advise on a solution.

Considering that 'zdb -e -cc' is happy and there were no
other prior trouble with these pools, except for an upgrade
to 10.1-BETA3/-RC1 from 10-STABLE as of cca. late September,
it is my belief that these pools are still healthy, just
non-importable. I'm reluctant to upgrade any other system
from 10.0 to 10.1 without finding out what went wrong here.

   Mark


On 10/10/2014 03:02, Steven Hartland wrote:
 > Sorry to be a pain but could you attach the full output or link it
 > somewhere as mail has messed up the formatting :(

Now at
   http://www.ijs.si/usr/mark/bsd/procstat-kka-2074.txt

On 2014-10-10 Mark Martinec wrote:
> In short, after upgrading to 10.1-BETA3 or -RC1 I ended up with several
> zfs pools that can no longer be imported. The zpool import command
> (with no arguments) does show all available pools, but trying to
> import one just hangs and the command cannot be aborted, although
> the rest of the system is still alive and fine:
>
> # zpool import -f <mypool>
>
> Typing ^T just shows an idle process, waiting on [tx->tx_sync_done_cv]:
>
> load: 0.20  cmd: zpool 939 [tx->tx_sync_done_cv] 5723.65r 0.01u 0.02s 0% 8220k
> load: 0.16  cmd: zpool 939 [tx->tx_sync_done_cv] 5735.73r 0.01u 0.02s 0% 8220k
> load: 0.14  cmd: zpool 939 [tx->tx_sync_done_cv] 5741.83r 0.01u 0.02s 0% 8220k
> load: 0.13  cmd: zpool 939 [tx->tx_sync_done_cv] 5749.16r 0.01u 0.02s 0% 8220k
>
> ps shows (on a system re-booted to a LiveCD running FreeBSD-10.1-RC1):
>
>   PID    TID COMM             TDNAME           CPU  PRI STATE WCHAN
>   939 100632 zpool            -                  5  122 sleep tx->tx_s
> UID PID PPID CPU PRI NI    VSZ  RSS MWCHAN   STAT TT     TIME COMMAND
>   0 939  801   0  22  0 107732 8236 tx->tx_s D+   v0  0:00.04
>   zpool import -f -o cachefile=/tmp/zpool.cache -R /tmp/sys0boot sys0boot
>
>   NWCHAN
>   fffff8007b0f2a20
>
> # procstat -kk 939
>
>   PID    TID COMM             TDNAME KSTACK
>   939 100632 zpool            -                mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d txg_wait_synced+0x85 spa_load+0x1cd1 spa_load_best+0x6f spa_import+0x1ff zfs_ioc_pool_import+0x137 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb
>
>
> Background story: the system where this happened was being kept
> to a fairly recent 10-STABLE. The last upgrade was very close to
> a BETA3 release. There are a couple of zfs pools there, one on a
> mirrored pair of SSDs mostly holding the OS, one with a mirrored
> pair of large spindles, and three more small ones (4 GiB each),
> mostly for boot redundancy or testing - these small ones are on
> old smallish disks. These disks are different, and attached to
> different SATA controllers (LSI and onboard Intel). Pools were
> mostly kept up-to-date to the most recent zpool features set
> through their lifetime (some starting their life with 9.0, some
> with 10.0).
>
> About two weeks ago after a reboot to a 10-STABLE of the day
> the small pools became unavailable, but the regular two large
> pools were still normal. At first I wasn't giving much attention
> to that, as these pools were on oldish disks and nonessential
> for normal operation, blaming a potentially crappy hw.
>
> Today I needed to do a reboot (for unrelated reason), and the
> machine was no longer able to mount the boot pool.
> The first instinct was - disks are malfunctioning - but ...
>
> Booting it to a FreeBSD-10.1-RC1 LiveCD was successful.
> smartmon disk test shows no problems.  dd is able to read whole
> partititions of each problematic pool. And most importantly,
> running a  'zdb -e -cc' on each (non-imported) pool was churning
> normally and steadily, producing a stats report at the end
> and reported no errors.
>
> As a final proof that disks are fine I sacrificed one of the broken
> 4 GiB GPT partitions with one of the problematic pools, and
> did a fresh 10.1-RC1 install on it from a distribution ISO DVD.
> The installation went fine and the system does boot and run
> fine from the newly installed OS. Trying to import one of the
> remaining old pools hangs the import command as before.
>
> As a final proof, I copied (with dd) one of the broken 4 GiB
> partitions to a file on another system (running 10.1-BETA3,
> which did not suffer from this problem), made a memory disk
> out of this file, then run zfs import on this pool - and it hangs
> there too! So hardware was not a problem - either these partitions
> are truly broken (even though zdb -cc says they are fine),
> or the new OS is somehow no longer able to import them.
>
> Please advise.
>
> I have a copy of the 4 GiB partition on a 400 MB compressed
> file available, if somebody would be willing to play with it.
>
> Also have a ktrace of the 'zpool import' command. It's last
> actions before it hangs are:
>
>    939 zpool    RET   madvise 0
>    939 zpool    CALL  madvise(0x80604e000,0x1000,MADV_FREE)
>    939 zpool    RET   madvise 0
>    939 zpool    CALL  close(0x6)
>    939 zpool    RET   close 0
>    939 zpool    CALL  ioctl(0x3,0xc0185a05,0x7fffffffbf00)
>    939 zpool    RET   ioctl -1 errno 2 No such file or directory
>    939 zpool    CALL  madvise(0x802c71000,0x10000,MADV_FREE)
>    939 zpool    RET   madvise 0
>    939 zpool    CALL  madvise(0x802ca5000,0x1000,MADV_FREE)
>    939 zpool    RET   madvise 0
>    939 zpool    CALL  ioctl(0x3,0xc0185a06,0x7fffffffbf60)
>    939 zpool    RET   ioctl 0
>    939 zpool    CALL  ioctl(0x3,0xc0185a06,0x7fffffffbf60)
>    939 zpool    RET   ioctl 0
>    939 zpool    CALL  stat(0x802c380e0,0x7fffffffbc58)
>    939 zpool    NAMI  "/tmp"
>    939 zpool    STRU  struct stat {dev=273, ino=2, mode=041777, nlink=8, uid=0, gid=0, rdev=96, atime=1412866648, stime=1412871393, ctime=1412871393, birthtime=1412866648, size=512, blksize=32768, blocks=8, flags=0x0 }
>    939 zpool    RET   stat 0
>    939 zpool    CALL  ioctl(0x3,0xc0185a02,0x7fffffffbc60)
>
>
> Mark



More information about the freebsd-stable mailing list