Analysis of disk file block with ZFS checksum error

Joe Peterson joe at skyrush.com
Fri Feb 8 22:29:44 UTC 2008


In my experimentation with the ZFS filesystem, I encountered one case of
a file block with a checksum mismatch.  Doing a "zpool scrub" revealed
it, and trying to read the file yielded an error - only the part of the
file before the bad block was read (ZFS aborts reading at this point,
which makes sense), resulting in a short file.  The reason the CKSUM
error is not fixable is because my ZFS pool contains only one device (no
mirror or RAIDZ), but I do have the original/good version of the file
affected.  Here's the output of zpool status (new scrub in process):

  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress, 64.36% done, 0h18m to go
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     2
          hda6      ONLINE       0     0     2

errors: Permanent errors have been detected in the following files:

        /mnt/tank/fbsd/home/joe/music/jukebox/christmas/Esquivel/
Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3


I was curious about what actually happened: was this a ZFS bug, trouble
with its metadata, or truly a bad block?  In order to determine this, I
modified ZFS's source code temporarily to ignore the checksum mismatch
and let the file read fully.  What I then got was the full-length file
and no errors, showing that there were no disk read errors associated
with the read (I already had assumed this from the fact that zpool
status showed only a non-zero CKSUM count), however, I may have seen
other error counts previously (ZFS resets them to zero on, e.g.,
reboot).  I received no errors when originally copying this file *to*
the ZFS pool - only on subsequent reads/scrubs.

(Note that I have posted before about DMA errors in my log for the disk
I am using, but I have had nothing but successful SeaTools tests
(surface scans) of the drive.  Jeremy Chadwick had similar issues, as
did others, so I think it is worth investigating if there is some
OS/software cause rather than real HW issues.  This is one reason I
wanted to investigate my ZFS checksum issue more deeply.)

I also have a good backup of the file in question, so I now have two
copies of the file: one good, and one with a bad block.  The file is
3575936 bytes long, and recordsize (in ZFS) is 128K, making the file
about 27 blocks long.  Curiously, the bad section of the file is exactly
65536 bytes long (1/2 a block).  The bad block starts at exactly the 5th
128K block (byte 65536 or hex a0000).

I wanted to see the characteristics of the bad data.  Was just one bit
flipped randomly?  No.  It is just one bit or set of bits in the bytes
that are affected?  It doesn't seem so.  Were there any other stange
patterns here?  Well, yes, and maybe someout out there with more
knowledge/experience in disk modes of failure will recognize something
(I have included some data below).

For one thing (as I mentioned), only 65536 bytes are bad (and it's
exactly this many, with a few "good" bytes thrown in, but not far from
what matches random chance would produce.  Also, all bad bytes have a
zero in the high bit - interesting?  Also, near the end of the block,
the bad bytes all go to zero, strangely coincident with the first "good"
zero in that bad block - not sure if that's coincidence or not.  Also, I
calculated the number of "Bits same" (matching bits) in the good vs. bad
bytes, and it appears fairly random, so it appears that the bad bytes
are very random in nature and not correlated much at all with the good
bytes.

So except for the fact that the 2nd half (65536 bytes) of the ZFS block
are good, the bad block seems to consist of random data, except for the
string of zero bytes near the end and the zero high-bit.  It's not as if
one bit on the disk flipped - it affects the whole (1/2) block.  Does
this seem like a disk error, controller error/bug, cable problem (I
recently put a new cable on, so I doubt this).  It seems to me something
more systemic rather than a random bit error - opinions are more than
welcome.

Here is some info from a python program I wrote to look at the data
(I've left out spans of essentially uninteresting portions showing
similar stuff, but I can get you the whole thing if interested):

File pos        Good    Bad     Match   Good (bin)  Bad (bin)  Bits same
0009fff0        d9      d9      Yes     11011001    11011001    8
0009fff1        05      05      Yes     00000101    00000101    8
0009fff2        c1      c1      Yes     11000001    11000001    8
0009fff3        81      81      Yes     10000001    10000001    8
0009fff4        5f      5f      Yes     01011111    01011111    8
0009fff5        66      66      Yes     01100110    01100110    8
0009fff6        5e      5e      Yes     01011110    01011110    8
0009fff7        a1      a1      Yes     10100001    10100001    8
0009fff8        ca      ca      Yes     11001010    11001010    8
0009fff9        9d      9d      Yes     10011101    10011101    8
0009fffa        00      00      Yes     00000000    00000000    8
0009fffb        90      90      Yes     10010000    10010000    8
0009fffc        32      32      Yes     00110010    00110010    8
0009fffd        62      62      Yes     01100010    01100010    8
0009fffe        a8      a8      Yes     10101000    10101000    8
0009ffff        b2      b2      Yes     10110010    10110010    8
--- Start of bad block ---
000a0000        d1      24      No      11010001    00100100    2
000a0001        6b      7b      No      01101011    01111011    7
000a0002        d1      31      No      11010001    00110001    5
000a0003        56      33      No      01010110    00110011    4
000a0004        44      38      No      01000100    00111000    3
000a0005        c3      41      No      11000011    01000001    6
000a0006        df      46      No      11011111    01000110    4
000a0007        07      45      No      00000111    01000101    6
000a0008        4c      7b      No      01001100    01111011    3
000a0009        a0      40      No      10100000    01000000    5
000a000a        54      0a      No      01010100    00001010    3
000a000b        35      40      No      00110101    01000000    3
000a000c        88      24      No      10001000    00100100    4
000a000d        38      24      No      00111000    00100100    5
000a000e        f5      7d      No      11110101    01111101    6
000a000f        28      31      No      00101000    00110001    5
.
.
.
000af6c1        d3      33      No      11010011    00110011    5
000af6c2        97      39      No      10010111    00111001    3
000af6c3        a5      32      No      10100101    00110010    3
000af6c4        6a      41      No      01101010    01000001    4
000af6c5        16      39      No      00010110    00111001    3
000af6c6        f2      7d      No      11110010    01111101    3
000af6c7        21      40      No      00100001    01000000    5
000af6c8        52      0a      No      01010010    00001010    5
000af6c9        00      00      Yes     00000000    00000000    8
000af6ca        2c      00      No      00101100    00000000    5
000af6cb        42      00      No      01000010    00000000    6
000af6cc        31      00      No      00110001    00000000    5
000af6cd        a1      00      No      10100001    00000000    5
000af6ce        d1      00      No      11010001    00000000    4
000af6cf        90      00      No      10010000    00000000    6
000af6d0        9c      00      No      10011100    00000000    4
.
.
.
000afff8        26      00      No      00100110    00000000    5
000afff9        8c      00      No      10001100    00000000    5
000afffa        a8      00      No      10101000    00000000    5
000afffb        0c      00      No      00001100    00000000    6
000afffc        f1      00      No      11110001    00000000    3
000afffd        93      00      No      10010011    00000000    4
000afffe        2c      00      No      00101100    00000000    5
000affff        2e      00      No      00101110    00000000    4
--- End of bad block ---
000b0000        62      62      Yes     01100010    01100010    8
000b0001        56      56      Yes     01010110    01010110    8
000b0002        91      91      Yes     10010001    10010001    8
000b0003        04      04      Yes     00000100    00000100    8
000b0004        01      01      Yes     00000001    00000001    8
000b0005        2d      2d      Yes     00101101    00101101    8
000b0006        0e      0e      Yes     00001110    00001110    8
000b0007        89      89      Yes     10001001    10001001    8
000b0008        8a      8a      Yes     10001010    10001010    8
000b0009        ad      ad      Yes     10101101    10101101    8
000b000a        4e      4e      Yes     01001110    01001110    8
000b000b        a3      a3      Yes     10100011    10100011    8
000b000c        13      13      Yes     00010011    00010011    8
000b000d        4d      4d      Yes     01001101    01001101    8
000b000e        07      07      Yes     00000111    00000111    8
000b000f        66      66      Yes     01100110    01100110    8
.
.
.
					-Joe


More information about the freebsd-stable mailing list