Analysis of disk file block with ZFS checksum error

Fri Feb 8 23:15:57 UTC 2008

Joe Peterson wrote:
> In my experimentation with the ZFS filesystem, I encountered one case of
> a file block with a checksum mismatch.  Doing a "zpool scrub" revealed
> it, and trying to read the file yielded an error - only the part of the
> file before the bad block was read (ZFS aborts reading at this point,
> which makes sense), resulting in a short file.  The reason the CKSUM
> error is not fixable is because my ZFS pool contains only one device (no
> mirror or RAIDZ), but I do have the original/good version of the file
> affected.  Here's the output of zpool status (new scrub in process):
> 
>   pool: tank
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: scrub in progress, 64.36% done, 0h18m to go
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     2
>           hda6      ONLINE       0     0     2
> 
> errors: Permanent errors have been detected in the following files:
> 
>         /mnt/tank/fbsd/home/joe/music/jukebox/christmas/Esquivel/
> Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3
> 
> 
> I was curious about what actually happened: was this a ZFS bug, trouble
> with its metadata, or truly a bad block?  In order to determine this, I
> modified ZFS's source code temporarily to ignore the checksum mismatch
> and let the file read fully.  What I then got was the full-length file
> and no errors, showing that there were no disk read errors associated
> with the read (I already had assumed this from the fact that zpool
> status showed only a non-zero CKSUM count), however, I may have seen
> other error counts previously (ZFS resets them to zero on, e.g.,
> reboot).  I received no errors when originally copying this file *to*
> the ZFS pool - only on subsequent reads/scrubs.
> 
> (Note that I have posted before about DMA errors in my log for the disk
> I am using, but I have had nothing but successful SeaTools tests
> (surface scans) of the drive.  Jeremy Chadwick had similar issues, as
> did others, so I think it is worth investigating if there is some
> OS/software cause rather than real HW issues.  This is one reason I
> wanted to investigate my ZFS checksum issue more deeply.)
> 
> I also have a good backup of the file in question, so I now have two
> copies of the file: one good, and one with a bad block.  The file is
> 3575936 bytes long, and recordsize (in ZFS) is 128K, making the file
> about 27 blocks long.  Curiously, the bad section of the file is exactly
> 65536 bytes long (1/2 a block).  The bad block starts at exactly the 5th
> 128K block (byte 65536 or hex a0000).
> 
> I wanted to see the characteristics of the bad data.  Was just one bit
> flipped randomly?  No.  It is just one bit or set of bits in the bytes
> that are affected?  It doesn't seem so.  Were there any other stange
> patterns here?  Well, yes, and maybe someout out there with more
> knowledge/experience in disk modes of failure will recognize something
> (I have included some data below).
> 
> For one thing (as I mentioned), only 65536 bytes are bad (and it's
> exactly this many, with a few "good" bytes thrown in, but not far from
> what matches random chance would produce.  Also, all bad bytes have a
> zero in the high bit - interesting?  Also, near the end of the block,
> the bad bytes all go to zero, strangely coincident with the first "good"
> zero in that bad block - not sure if that's coincidence or not.  Also, I
> calculated the number of "Bits same" (matching bits) in the good vs. bad
> bytes, and it appears fairly random, so it appears that the bad bytes
> are very random in nature and not correlated much at all with the good
> bytes.
> 
> So except for the fact that the 2nd half (65536 bytes) of the ZFS block
> are good, the bad block seems to consist of random data, except for the
> string of zero bytes near the end and the zero high-bit.  It's not as if
> one bit on the disk flipped - it affects the whole (1/2) block.  Does
> this seem like a disk error, controller error/bug, cable problem (I
> recently put a new cable on, so I doubt this).  It seems to me something
> more systemic rather than a random bit error - opinions are more than
> welcome.
> 
> Here is some info from a python program I wrote to look at the data
> (I've left out spans of essentially uninteresting portions showing
> similar stuff, but I can get you the whole thing if interested):
> 
> File pos        Good    Bad     Match   Good (bin)  Bad (bin)  Bits same
> 0009fff0        d9      d9      Yes     11011001    11011001    8
> 0009fff1        05      05      Yes     00000101    00000101    8
> 0009fff2        c1      c1      Yes     11000001    11000001    8
> 0009fff3        81      81      Yes     10000001    10000001    8
> 0009fff4        5f      5f      Yes     01011111    01011111    8
> 0009fff5        66      66      Yes     01100110    01100110    8
> 0009fff6        5e      5e      Yes     01011110    01011110    8
> 0009fff7        a1      a1      Yes     10100001    10100001    8
> 0009fff8        ca      ca      Yes     11001010    11001010    8
> 0009fff9        9d      9d      Yes     10011101    10011101    8
> 0009fffa        00      00      Yes     00000000    00000000    8
> 0009fffb        90      90      Yes     10010000    10010000    8
> 0009fffc        32      32      Yes     00110010    00110010    8
> 0009fffd        62      62      Yes     01100010    01100010    8
> 0009fffe        a8      a8      Yes     10101000    10101000    8
> 0009ffff        b2      b2      Yes     10110010    10110010    8
> --- Start of bad block ---
> 000a0000        d1      24      No      11010001    00100100    2
> 000a0001        6b      7b      No      01101011    01111011    7
> 000a0002        d1      31      No      11010001    00110001    5
> 000a0003        56      33      No      01010110    00110011    4
> 000a0004        44      38      No      01000100    00111000    3
> 000a0005        c3      41      No      11000011    01000001    6
> 000a0006        df      46      No      11011111    01000110    4
> 000a0007        07      45      No      00000111    01000101    6
> 000a0008        4c      7b      No      01001100    01111011    3
> 000a0009        a0      40      No      10100000    01000000    5
> 000a000a        54      0a      No      01010100    00001010    3
> 000a000b        35      40      No      00110101    01000000    3
> 000a000c        88      24      No      10001000    00100100    4
> 000a000d        38      24      No      00111000    00100100    5
> 000a000e        f5      7d      No      11110101    01111101    6
> 000a000f        28      31      No      00101000    00110001    5
> .
> .
> .
> 000af6c1        d3      33      No      11010011    00110011    5
> 000af6c2        97      39      No      10010111    00111001    3
> 000af6c3        a5      32      No      10100101    00110010    3
> 000af6c4        6a      41      No      01101010    01000001    4
> 000af6c5        16      39      No      00010110    00111001    3
> 000af6c6        f2      7d      No      11110010    01111101    3
> 000af6c7        21      40      No      00100001    01000000    5
> 000af6c8        52      0a      No      01010010    00001010    5
> 000af6c9        00      00      Yes     00000000    00000000    8
> 000af6ca        2c      00      No      00101100    00000000    5
> 000af6cb        42      00      No      01000010    00000000    6
> 000af6cc        31      00      No      00110001    00000000    5
> 000af6cd        a1      00      No      10100001    00000000    5
> 000af6ce        d1      00      No      11010001    00000000    4
> 000af6cf        90      00      No      10010000    00000000    6
> 000af6d0        9c      00      No      10011100    00000000    4
> .
> .
> .
> 000afff8        26      00      No      00100110    00000000    5
> 000afff9        8c      00      No      10001100    00000000    5
> 000afffa        a8      00      No      10101000    00000000    5
> 000afffb        0c      00      No      00001100    00000000    6
> 000afffc        f1      00      No      11110001    00000000    3
> 000afffd        93      00      No      10010011    00000000    4
> 000afffe        2c      00      No      00101100    00000000    5
> 000affff        2e      00      No      00101110    00000000    4
> --- End of bad block ---
> 000b0000        62      62      Yes     01100010    01100010    8
> 000b0001        56      56      Yes     01010110    01010110    8
> 000b0002        91      91      Yes     10010001    10010001    8
> 000b0003        04      04      Yes     00000100    00000100    8
> 000b0004        01      01      Yes     00000001    00000001    8
> 000b0005        2d      2d      Yes     00101101    00101101    8
> 000b0006        0e      0e      Yes     00001110    00001110    8
> 000b0007        89      89      Yes     10001001    10001001    8
> 000b0008        8a      8a      Yes     10001010    10001010    8
> 000b0009        ad      ad      Yes     10101101    10101101    8
> 000b000a        4e      4e      Yes     01001110    01001110    8
> 000b000b        a3      a3      Yes     10100011    10100011    8
> 000b000c        13      13      Yes     00010011    00010011    8
> 000b000d        4d      4d      Yes     01001101    01001101    8
> 000b000e        07      07      Yes     00000111    00000111    8
> 000b000f        66      66      Yes     01100110    01100110    8
> .
> .
> .

A wild guess: bytes all with the top bit zero, plus I see occasional
0x0a bytes in there - it looks like that 1/2 block might have gotten
overwritten with some ASCII text.  Don't ask me how though...
(but if it did, I am afraid I would suspect a bug in ZFS :-(  )

Gary