Analysis of disk file block with ZFS checksum error
Gary Corcoran
gcorcoran at rcn.com
Fri Feb 8 23:15:57 UTC 2008
Joe Peterson wrote:
> In my experimentation with the ZFS filesystem, I encountered one case of
> a file block with a checksum mismatch. Doing a "zpool scrub" revealed
> it, and trying to read the file yielded an error - only the part of the
> file before the bad block was read (ZFS aborts reading at this point,
> which makes sense), resulting in a short file. The reason the CKSUM
> error is not fixable is because my ZFS pool contains only one device (no
> mirror or RAIDZ), but I do have the original/good version of the file
> affected. Here's the output of zpool status (new scrub in process):
>
> pool: tank
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://www.sun.com/msg/ZFS-8000-8A
> scrub: scrub in progress, 64.36% done, 0h18m to go
> config:
>
> NAME STATE READ WRITE CKSUM
> tank ONLINE 0 0 2
> hda6 ONLINE 0 0 2
>
> errors: Permanent errors have been detected in the following files:
>
> /mnt/tank/fbsd/home/joe/music/jukebox/christmas/Esquivel/
> Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3
>
>
> I was curious about what actually happened: was this a ZFS bug, trouble
> with its metadata, or truly a bad block? In order to determine this, I
> modified ZFS's source code temporarily to ignore the checksum mismatch
> and let the file read fully. What I then got was the full-length file
> and no errors, showing that there were no disk read errors associated
> with the read (I already had assumed this from the fact that zpool
> status showed only a non-zero CKSUM count), however, I may have seen
> other error counts previously (ZFS resets them to zero on, e.g.,
> reboot). I received no errors when originally copying this file *to*
> the ZFS pool - only on subsequent reads/scrubs.
>
> (Note that I have posted before about DMA errors in my log for the disk
> I am using, but I have had nothing but successful SeaTools tests
> (surface scans) of the drive. Jeremy Chadwick had similar issues, as
> did others, so I think it is worth investigating if there is some
> OS/software cause rather than real HW issues. This is one reason I
> wanted to investigate my ZFS checksum issue more deeply.)
>
> I also have a good backup of the file in question, so I now have two
> copies of the file: one good, and one with a bad block. The file is
> 3575936 bytes long, and recordsize (in ZFS) is 128K, making the file
> about 27 blocks long. Curiously, the bad section of the file is exactly
> 65536 bytes long (1/2 a block). The bad block starts at exactly the 5th
> 128K block (byte 65536 or hex a0000).
>
> I wanted to see the characteristics of the bad data. Was just one bit
> flipped randomly? No. It is just one bit or set of bits in the bytes
> that are affected? It doesn't seem so. Were there any other stange
> patterns here? Well, yes, and maybe someout out there with more
> knowledge/experience in disk modes of failure will recognize something
> (I have included some data below).
>
> For one thing (as I mentioned), only 65536 bytes are bad (and it's
> exactly this many, with a few "good" bytes thrown in, but not far from
> what matches random chance would produce. Also, all bad bytes have a
> zero in the high bit - interesting? Also, near the end of the block,
> the bad bytes all go to zero, strangely coincident with the first "good"
> zero in that bad block - not sure if that's coincidence or not. Also, I
> calculated the number of "Bits same" (matching bits) in the good vs. bad
> bytes, and it appears fairly random, so it appears that the bad bytes
> are very random in nature and not correlated much at all with the good
> bytes.
>
> So except for the fact that the 2nd half (65536 bytes) of the ZFS block
> are good, the bad block seems to consist of random data, except for the
> string of zero bytes near the end and the zero high-bit. It's not as if
> one bit on the disk flipped - it affects the whole (1/2) block. Does
> this seem like a disk error, controller error/bug, cable problem (I
> recently put a new cable on, so I doubt this). It seems to me something
> more systemic rather than a random bit error - opinions are more than
> welcome.
>
> Here is some info from a python program I wrote to look at the data
> (I've left out spans of essentially uninteresting portions showing
> similar stuff, but I can get you the whole thing if interested):
>
> File pos Good Bad Match Good (bin) Bad (bin) Bits same
> 0009fff0 d9 d9 Yes 11011001 11011001 8
> 0009fff1 05 05 Yes 00000101 00000101 8
> 0009fff2 c1 c1 Yes 11000001 11000001 8
> 0009fff3 81 81 Yes 10000001 10000001 8
> 0009fff4 5f 5f Yes 01011111 01011111 8
> 0009fff5 66 66 Yes 01100110 01100110 8
> 0009fff6 5e 5e Yes 01011110 01011110 8
> 0009fff7 a1 a1 Yes 10100001 10100001 8
> 0009fff8 ca ca Yes 11001010 11001010 8
> 0009fff9 9d 9d Yes 10011101 10011101 8
> 0009fffa 00 00 Yes 00000000 00000000 8
> 0009fffb 90 90 Yes 10010000 10010000 8
> 0009fffc 32 32 Yes 00110010 00110010 8
> 0009fffd 62 62 Yes 01100010 01100010 8
> 0009fffe a8 a8 Yes 10101000 10101000 8
> 0009ffff b2 b2 Yes 10110010 10110010 8
> --- Start of bad block ---
> 000a0000 d1 24 No 11010001 00100100 2
> 000a0001 6b 7b No 01101011 01111011 7
> 000a0002 d1 31 No 11010001 00110001 5
> 000a0003 56 33 No 01010110 00110011 4
> 000a0004 44 38 No 01000100 00111000 3
> 000a0005 c3 41 No 11000011 01000001 6
> 000a0006 df 46 No 11011111 01000110 4
> 000a0007 07 45 No 00000111 01000101 6
> 000a0008 4c 7b No 01001100 01111011 3
> 000a0009 a0 40 No 10100000 01000000 5
> 000a000a 54 0a No 01010100 00001010 3
> 000a000b 35 40 No 00110101 01000000 3
> 000a000c 88 24 No 10001000 00100100 4
> 000a000d 38 24 No 00111000 00100100 5
> 000a000e f5 7d No 11110101 01111101 6
> 000a000f 28 31 No 00101000 00110001 5
> .
> .
> .
> 000af6c1 d3 33 No 11010011 00110011 5
> 000af6c2 97 39 No 10010111 00111001 3
> 000af6c3 a5 32 No 10100101 00110010 3
> 000af6c4 6a 41 No 01101010 01000001 4
> 000af6c5 16 39 No 00010110 00111001 3
> 000af6c6 f2 7d No 11110010 01111101 3
> 000af6c7 21 40 No 00100001 01000000 5
> 000af6c8 52 0a No 01010010 00001010 5
> 000af6c9 00 00 Yes 00000000 00000000 8
> 000af6ca 2c 00 No 00101100 00000000 5
> 000af6cb 42 00 No 01000010 00000000 6
> 000af6cc 31 00 No 00110001 00000000 5
> 000af6cd a1 00 No 10100001 00000000 5
> 000af6ce d1 00 No 11010001 00000000 4
> 000af6cf 90 00 No 10010000 00000000 6
> 000af6d0 9c 00 No 10011100 00000000 4
> .
> .
> .
> 000afff8 26 00 No 00100110 00000000 5
> 000afff9 8c 00 No 10001100 00000000 5
> 000afffa a8 00 No 10101000 00000000 5
> 000afffb 0c 00 No 00001100 00000000 6
> 000afffc f1 00 No 11110001 00000000 3
> 000afffd 93 00 No 10010011 00000000 4
> 000afffe 2c 00 No 00101100 00000000 5
> 000affff 2e 00 No 00101110 00000000 4
> --- End of bad block ---
> 000b0000 62 62 Yes 01100010 01100010 8
> 000b0001 56 56 Yes 01010110 01010110 8
> 000b0002 91 91 Yes 10010001 10010001 8
> 000b0003 04 04 Yes 00000100 00000100 8
> 000b0004 01 01 Yes 00000001 00000001 8
> 000b0005 2d 2d Yes 00101101 00101101 8
> 000b0006 0e 0e Yes 00001110 00001110 8
> 000b0007 89 89 Yes 10001001 10001001 8
> 000b0008 8a 8a Yes 10001010 10001010 8
> 000b0009 ad ad Yes 10101101 10101101 8
> 000b000a 4e 4e Yes 01001110 01001110 8
> 000b000b a3 a3 Yes 10100011 10100011 8
> 000b000c 13 13 Yes 00010011 00010011 8
> 000b000d 4d 4d Yes 01001101 01001101 8
> 000b000e 07 07 Yes 00000111 00000111 8
> 000b000f 66 66 Yes 01100110 01100110 8
> .
> .
> .
A wild guess: bytes all with the top bit zero, plus I see occasional
0x0a bytes in there - it looks like that 1/2 block might have gotten
overwritten with some ASCII text. Don't ask me how though...
(but if it did, I am afraid I would suspect a bug in ZFS :-( )
Gary
More information about the freebsd-stable
mailing list