zpool import hanging on unexpectedly-rebooted machine
colin at lefty.tv
Mon Aug 18 11:42:11 UTC 2008
I've got an interestingly frustrating problem on my hands with our
7.0-STABLE boxes running ZFS. Sun X4500 box running amd64, 16GB of
RAM., 46x1TB disks in RAIDZ1. (other two for the OS.)
Uname for the box is:
FreeBSD sf-nas1-c160a.storefront.com 7.0-STABLE FreeBSD 7.0-STABLE #1:
Sat May 31 14:54:22 PDT 2008
root at sf-nas1-c160a.storefront.com:/usr/obj/usr/src/sys/X4500 amd64
The box has been running relatively reliably for some months now, but
our hosting provider decided to reboot it on us without asking. After
the box came back, it had lost /boot/zfs/zpool.cache, so I needed to
reimport the only zpool on the machine (named zfsdata).
Running zpool import gives me the output I'm expecting, showing a single
zpool called zfsdata, status of ONLINE, and all the disks are showing up.
However, when I run zpool import -f <numerical_pool_id>, the zpool
command simply hangs up with no disk and no CPU activity. I've run
truss on the zpool import, and the last thing I see happening is:
open("/dev/ad96",O_RDONLY,030115000) = 6 (0x6)
ioctl(6,DIOCGIDENT,0xffff9480) = 0 (0x0)
close(6) = 0 (0x0)
After turning on vfs.zfs.debug, I also see this on the console:
zfs_ereport_post:293: time=1219057172.795893475 ereport_version=0
class=fs.zfs.checksum zfs_scheme_version=0 pool=zfsdata
vdev_guid=7326417523786577584 vdev_type=disk vdev_path=/dev/ad12
parent_type=raidz zio_err=0 zio_offset=89290496000 zio_size=512
zio_object=132 zio_level=0 zio_blkid=244
I also get a boatload of:
GEOM: ad46: GPT rejected -- may not be recoverable.
GEOM: ad48: corrupt or invalid GPT detected.
for each disk in the zpool. This has been happening since we installed
the machine, though, so I'm not quite sure it's related. The other
Thumper we have that's configured identically also complains about the
GPT. I'm assuming these messages appear because ZFS uses its own
on-disk format that GEOM doesn't understand yet.
Help! I've got several terabytes of data sitting there that I'd really
like to get my hands on. How can I get past this deadlock? I'm willing
to rebuild kernel and world on the box to get (potentially) newer zfs
code, but going to -CURRENT is really not an option at the moment.
colin at lefty.tv
More information about the freebsd-current