Unexpected "resilver" after reboot (after scrub found CKSUM problems)

Joe Peterson joe at skyrush.com
Wed Jan 30 09:18:41 PST 2008


[...reposting to freebsd-stable - no response on freebsd-fs]

I had a strange thing happen on ZFS the other day, and I cannot find any
info about it on the web - thought you might have some ideas.  I am
using 7.0-RC1 at the moment.

I found a checksum error in ZFS during a scrub.  This is strange in
itself, since I believe the disk is OK (see below):



  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          ad0s1d    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:


/home/joe/music/jukebox/christmas/Esquivel/Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3



This is how it appears after a recent reboot, however.  After a scrub, I
see varying number of non-zero counts under CKSUM.  Not sure why it is
zero after reboot (maybe that's normal).

However, the strange this is that after my first reboot after the scrub
found the issue, zpool status told me that "resilver completed with 0
errors", and there were no known errors.  Only trying to read the file
and/or rescrubbing returned the status to the error state and made the
CKSUM column non-zero.  Since I do not have a mirror or raid config, I'm
not sure why it would resilver at all, and I did nothing explicit to
cause a resilver (as far as I know)...

Any ideas?

As an aside, I, along with some others on freebsd-stable at freebsd.org,
have been seeing what "look" like disk errors in the system logs.  I
have a suspicion that there could be some other cause (lots of
discussion on that list, if you are interested).

Strangely, this disk checks out fine on both short and long tests in
Seatools, and smartctl shows it as OK.  Also, using Linux to do lots of
reads from it does not show any issue or error logs.  At this point, I
am not sure if the CKSUM issue is a real HW flaw or something else...

					Thanks, Joe




More information about the freebsd-stable mailing list