zpool import hanging on unexpectedly-rebooted machine

Colin Moller colin at lefty.tv
Mon Aug 18 11:42:11 UTC 2008

Hey all,

I've got an interestingly frustrating problem on my hands with our 
7.0-STABLE boxes running ZFS.  Sun X4500 box running amd64, 16GB of 
RAM., 46x1TB disks in RAIDZ1. (other two for the OS.)

Uname for the box is:
FreeBSD sf-nas1-c160a.storefront.com 7.0-STABLE FreeBSD 7.0-STABLE #1: 
Sat May 31 14:54:22 PDT 2008     
root at sf-nas1-c160a.storefront.com:/usr/obj/usr/src/sys/X4500  amd64

The box has been running relatively reliably for some months now, but 
our hosting provider decided to reboot it on us without asking.  After 
the box came back, it had lost /boot/zfs/zpool.cache, so I needed to 
reimport the only zpool on the machine (named zfsdata).

Running zpool import gives me the output I'm expecting, showing a single 
zpool called zfsdata, status of ONLINE, and all the disks are showing up.

However, when I run zpool import -f <numerical_pool_id>, the zpool 
command simply hangs up with no disk and no CPU activity.  I've run 
truss on the zpool import, and the last thing I see happening is:

open("/dev/ad96",O_RDONLY,030115000)             = 6 (0x6)
ioctl(6,DIOCGIDENT,0xffff9480)                   = 0 (0x0)
close(6)                                         = 0 (0x0)

After turning on vfs.zfs.debug, I also see this on the console:

zfs_ereport_post:293[1]: time=1219057172.795893475 ereport_version=0 
class=fs.zfs.checksum zfs_scheme_version=0 pool=zfsdata 
pool_guid=316648131406719055 pool_context=2 
vdev_guid=7326417523786577584 vdev_type=disk vdev_path=/dev/ad12 
vdev_devid=ad:GTF000PAHX5TMF parent_guid=6708978418893991394 
parent_type=raidz zio_err=0 zio_offset=89290496000 zio_size=512 
zio_object=132 zio_level=0 zio_blkid=244

I also get a boatload of:
GEOM: ad46: GPT rejected -- may not be recoverable.
GEOM: ad48: corrupt or invalid GPT detected.
for each disk in the zpool.  This has been happening since we installed 
the machine, though, so I'm not quite sure it's related.  The other 
Thumper we have that's configured identically also complains about the 
GPT.  I'm assuming these messages appear because ZFS uses its own 
on-disk format that GEOM doesn't understand yet.

Help! I've got several terabytes of data sitting there that I'd really 
like to get my hands on. How can I get past this deadlock?  I'm willing 
to rebuild kernel and world on the box to get (potentially) newer zfs 
code, but going to -CURRENT is really not an option at the moment.


Colin Moller
colin at lefty.tv

More information about the freebsd-current mailing list