ZFS corruption on 8-CURRENT

Jason Edwards sub.mesa at gmail.com
Sun Aug 9 11:47:11 UTC 2009


Hi guys,

I'm investigating some weird corruption issue. After filling up my 8-disk
RAID-Z pool with data and using it for a few weeks, it started to show me
this:


# zpool status sub
  pool: sub
 state: UNAVAIL
status: One or more devices could not be used because the label is missing
        or invalid.  There are insufficient replicas for the pool to
continue
        functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        sub         UNAVAIL      0     0     0  insufficient replicas
          raidz1    UNAVAIL      0     0     0  insufficient replicas
            ad14a   FAULTED      0     0     0  corrupted data
            ad8a    ONLINE       0     0     0
            ad10a   ONLINE       0     0     0
            ad10a   FAULTED      0     0     0  corrupted data
            ad18a   FAULTED      0     0     0  corrupted data
            ad12a   FAULTED      0     0     0  corrupted data
            ad16a   FAULTED      0     0     0  corrupted data
            ad8a    FAULTED      0     0     0  corrupted data


oops? What happened here? Besides the "corrupted data" it can also be seen
ad10a is displayed twice, one online and one failed.
After rebooting, it shows a little cleaner, but it found a problem with the
ZIL:


# zpool status sub
  pool: sub
 state: FAULTED
status: An intent log record could not be read.
        Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
        or ignore the intent log records by running 'zpool clear'.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        sub          FAULTED      0     0     0  bad intent log
          raidz1    ONLINE       0     0     0
            ad14a   ONLINE       0     0     0
            ad4a    ONLINE       0     0     0
            ad6a    ONLINE       0     0     0
            ad10a   ONLINE       0     0     0
            ad18a   ONLINE       6     0     0
            ad12a   ONLINE       0     0     0
            ad16a   ONLINE       0     0     0
            ad8a    ONLINE       0     0     0


Additionally, i got some read errors on ad18. But since this is a raid-z i
guess one disk alone cannot corrupt/fail the entire array.
Before i do any actions that might be destructive, anybody has a clue what
happened here and how i can prevent this in the future?
Box is a quadcore X4 9350e with 6GB RAM and its running 8-CURRENT as of July
21th 2009 (after 8.0-BETA2). It did work correctly before upgrading CURRENT
to a newer date. Maybe some bug slipped in?

Kind regards,
sub


More information about the freebsd-fs mailing list