Problem with zpool
Adam Stylinski
kungfujesus06 at gmail.com
Thu Aug 26 18:33:06 UTC 2010
Ok so I was sending snapshots into the pool and I may have sent one that was
bad because I got a "bad magic number" error at some point. At the time I
figured it was my disk (ad6) which was going bad. It would not allow me no
matter what I to offline this ad6 device, it claimed insufficient replicas
existed. So I removed the device and put a device on the same port on the
same controller to replace it. I then ran the replace command only to have
it sit there pretending to replace but zpool would sit on top with status
g_wait. I googled around and found a guy on one of the mailing lists with a
similar bug, they said it was fixed in a revision to zfs (can't remember
which mfc) but upgrading to fbsd 8.1 would fix the problem. This is a v13
pool, and now that I've upgraded to 8.1, I'm running a scrub which forced
the resilver and it's claiming to "replace". Well as it turns out I had a
cronjob for freebsd-update cron which happened to pop in for 8.0 some time
during freebsd-update. So the result was I was on an 8.1 kernel but an 8.0
userland, which did allow me to scrub (I guess the ABIs didn't break this
update between zpool and libzpool).
It finished the scrub, but it still claims this:
8991447011275450347 UNAVAIL 0 10.0K 0 was
/dev/ad6/old
ada2 ONLINE 0 0 0 14.1G
resilvered
Meaning it's trying to write to the old device which it won't let me
offline, while still resilvering to ada2. While it did tell me that it
resilvered successfully with no checksum errors or read or write errors,
it's still there. I am now scrubbing with 8.1's userland and kernel, do you
guys think it will finally allow me to remove the device it's replacing?
Also, it claims that there is corruption in the pool, but the files that
are "affected" are 100% fine, as I've md5'd them against remote copies. I
realize the metadata could be bad, so I'm not sure how to go about fixing
that. Anyway, please don't tl;dr and let me know whatever advice you can
give. I have some moderate level backups of some of the data (the entire
pool has 2.7TB occupied), but I'd like to avoid a destroy and create
process. So far this is what zpool status reports:
[adam at nasbox ~]$ zpool status
pool: share
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: resilver in progress for 0h9m, 2.16% done, 6h53m to go
config:
NAME STATE READ WRITE CKSUM
share DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
ada1 ONLINE 0 0 0 60.9M
resilvered
ada3 ONLINE 0 0 0 60.9M
resilvered
replacing DEGRADED 0 0 0
8991447011275450347 UNAVAIL 0 10.0K 0 was
/dev/ad6/old
ada2 ONLINE 0 0 0 14.1G
resilvered
ada4 ONLINE 0 0 0 56.7M
resilvered
raidz1 ONLINE 0 0 0
da0 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
aacd0 ONLINE 0 0 0
aacd1 ONLINE 0 0 0
aacd2 ONLINE 0 0 0
aacd3 ONLINE 0 0 0
logs DEGRADED 0 0 0
ada0 ONLINE 0 0 0
More information about the freebsd-fs
mailing list