Unexpected "resilver" after reboot (after scrub found CKSUM
joe at skyrush.com
Sun Jan 27 11:53:41 PST 2008
Hi Pawel (or anyone else who might know),
I had a strange thing happen on ZFS the other day, and I cannot find any
info about it on the web - thought you might have some ideas. I am
using 7.0-RC1 at the moment.
I found a checksum error in ZFS during a scrub. This is strange in
itself, since I believe the disk is OK (see below):
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
scrub: none requested
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
ad0s1d ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
This is how it appears after a recent reboot, however. After a scrub, I
see varying number of non-zero counts under CKSUM. Not sure why it is
zero after reboot (maybe that's normal).
However, the strange this is that after my first reboot after the scrub
found the issue, zpool status told me that "resilver completed with 0
errors", and there were no known errors. Only trying to read the file
and/or rescrubbing returned the status to the error state and made the
CKSUM column non-zero. Since I do not have a mirror or raid config, I'm
not sure why it would resilver at all, and I did nothing explicit to
cause a resilver (as far as I know)...
As an aside, I, along with some others on freebsd-stable at freebsd.org,
have been seeing what "look" like disk errors in the system logs. I
have a suspicion that there could be some other cause (lots of
discussion on that list, if you are interested).
Strangely, this disk checks out fine on both short and long tests in
Seatools, and smartctl shows it as OK. Also, using Linux to do lots of
reads from it does not show any issue or error logs. At this point, I
am not sure if the CKSUM issue is a real HW flaw or something else...
More information about the freebsd-fs