8.1-RELEASE: ZFS data errors

Mon Nov 8 19:11:31 UTC 2010

On 11/08/2010 11:06 AM, Jeremy Chadwick wrote:
> On Mon, Nov 08, 2010 at 10:32:56AM -0800, Mike Carlson wrote:
>> I'm having a problem with  stripping 7 18TB RAID6 (hardware SAN)
>> volumes together.
>>
>> Here is a quick rundown of the hardware:
>> * HP DL180 G6 w/12GB ram
>> * QLogic FC HBA (Qlogic ISP 2532 PCI FC-AL Adapter)
>> * Winchester Hardware SAN,
>>
>>     da2 at isp0 bus 0 scbus2 target 0 lun 0
>>     da2:<WINSYS SX2318R 373O>  Fixed Direct Access SCSI-5 device
>>     da2: 800.000MB/s transfers
>>     da2: Command Queueing enabled
>>     da2: 19074680MB (39064944640 512 byte sectors: 255H 63S/T 2431680C)
>>
>>
>> As soon as I create the volume and write data to it, it is reported
>> as being corrupted:
>>
>>     write# zpool create filevol001 da2 da3 da4 da5 da6 da7 da8
>>     write# zpool scrub filevol001dd if=/dev/random
>>     of=/filevol001/random.dat.1 bs=1m count=1000
>>     write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000
>>     1000+0 records in
>>     1000+0 records out
>>     1048576000 bytes transferred in 16.472807 secs (63654968 bytes/sec)
>>     write# cd /filevol001/
>>     write# ls
>>     random.dat.1
>>     write# md5 *
>>     MD5 (random.dat.1) = 629f8883d6394189a1658d24a5698bb3
>>     write# cp random.dat.1 random.dat.2
>>     cp: random.dat.1: Input/output error
>>     write# zpool status
>>        pool: filevol001
>>       state: ONLINE
>>       scrub: none requested
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          filevol001  ONLINE       0     0     0
>>            da2       ONLINE       0     0     0
>>            da3       ONLINE       0     0     0
>>            da4       ONLINE       0     0     0
>>            da5       ONLINE       0     0     0
>>            da6       ONLINE       0     0     0
>>            da7       ONLINE       0     0     0
>>            da8       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>     write# zpool scrub filevol001
>>     write# zpool status
>>        pool: filevol001
>>       state: ONLINE
>>     status: One or more devices has experienced an error resulting in data
>>          corruption.  Applications may be affected.
>>     action: Restore the file in question if possible.  Otherwise restore the
>>          entire pool from backup.
>>         see: http://BLOCKEDwww.BLOCKEDsun.com/msg/ZFS-8000-8A
>>       scrub: scrub completed after 0h0m with 2437 errors on Mon Nov  8
>>     10:14:20 2010
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          filevol001  ONLINE       0     0 2.38K
>>            da2       ONLINE       0     0 1.24K  12K repaired
>>            da3       ONLINE       0     0 1.12K
>>            da4       ONLINE       0     0 1.13K
>>            da5       ONLINE       0     0 1.27K
>>            da6       ONLINE       0     0     0
>>            da7       ONLINE       0     0     0
>>            da8       ONLINE       0     0     0
>>
>>     errors: 2437 data errors, use '-v' for a list
>>
>> However, if I create a 'raidz' volume, no errors occur:
>>
>>     write# zpool destroy filevol001
>>     write# zpool create filevol001 raidz da2 da3 da4 da5 da6 da7 da8
>>     write# zpool status
>>        pool: filevol001
>>       state: ONLINE
>>       scrub: none requested
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          filevol001  ONLINE       0     0     0
>>            raidz1    ONLINE       0     0     0
>>              da2     ONLINE       0     0     0
>>              da3     ONLINE       0     0     0
>>              da4     ONLINE       0     0     0
>>              da5     ONLINE       0     0     0
>>              da6     ONLINE       0     0     0
>>              da7     ONLINE       0     0     0
>>              da8     ONLINE       0     0     0
>>
>>     errors: No known data errors
>>     write# dd if=/dev/random of=/filevol001/random.dat.1 bs=1m count=1000
>>     1000+0 records in
>>     1000+0 records out
>>     1048576000 bytes transferred in 17.135045 secs (61194821 bytes/sec)
>>     write# zpool scrub filevol001
>>
>>     dmesg output:
>>     write# zpool status
>>        pool: filevol001
>>       state: ONLINE
>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>     09:54:51 2010
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          filevol001  ONLINE       0     0     0
>>            raidz1    ONLINE       0     0     0
>>              da2     ONLINE       0     0     0
>>              da3     ONLINE       0     0     0
>>              da4     ONLINE       0     0     0
>>              da5     ONLINE       0     0     0
>>              da6     ONLINE       0     0     0
>>              da7     ONLINE       0     0     0
>>              da8     ONLINE       0     0     0
>>
>>     errors: No known data errors
>>     write# ls
>>     random.dat.1
>>     write# cp random.dat.1 random.dat.2
>>     write# cp random.dat.1 random.dat.3
>>     write# cp random.dat.1 random.dat.4
>>     write# cp random.dat.1 random.dat.5
>>     write# cp random.dat.1 random.dat.6
>>     write# cp random.dat.1 random.dat.7
>>     write# md5 *
>>     MD5 (random.dat.1) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>     MD5 (random.dat.2) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>     MD5 (random.dat.3) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>     MD5 (random.dat.4) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>     MD5 (random.dat.5) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>     MD5 (random.dat.6) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>     MD5 (random.dat.7) = f5e3467f61a954bc2e0bcc35d49ac8b2
>>
>> What is also odd, is if I create 7 separate ZFS volumes, they do not
>> report any data corruption:
>>
>>     write# zpool destroy filevol001
>>     write# zpool create test01 da2
>>     write# zpool create test02 da3
>>     write# zpool create test03 da4
>>     write# zpool create test04 da5
>>     write# zpool create test05 da6
>>     write# zpool create test06 da7
>>     write# zpool create test07 da8
>>     write# zpool status
>>        pool: test01
>>       state: ONLINE
>>       scrub: none requested
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test01      ONLINE       0     0     0
>>            da2       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test02
>>       state: ONLINE
>>       scrub: none requested
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test02      ONLINE       0     0     0
>>            da3       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test03
>>       state: ONLINE
>>       scrub: none requested
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test03      ONLINE       0     0     0
>>            da4       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test04
>>       state: ONLINE
>>       scrub: none requested
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test04      ONLINE       0     0     0
>>            da5       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test05
>>       state: ONLINE
>>       scrub: none requested
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test05      ONLINE       0     0     0
>>            da6       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test06
>>       state: ONLINE
>>       scrub: none requested
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test06      ONLINE       0     0     0
>>            da7       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test07
>>       state: ONLINE
>>       scrub: none requested
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test07      ONLINE       0     0     0
>>            da8       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>     write# dd if=/dev/random of=/tmp/random.dat.1 bs=1m count=1000
>>     1000+0 records in
>>     1000+0 records out
>>     1048576000 bytes transferred in 19.286735 secs (54367730 bytes/sec)
>>     write# cd /tmp/
>>     write# md5 /tmp/random.dat.1
>>     MD5 (/tmp/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>     write# cp random.dat.1 /test01 ; cp random.dat.1 /test02 ;cp
>>     random.dat.1 /test03 ; cp random.dat.1 /test04 ; cp random.dat.1
>>     /test05 ; cp random.dat.1 /test06 ; cp random.dat.1 /test07
>>     write# md5 /test*/*
>>     MD5 (/test01/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>     MD5 (/test02/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>     MD5 (/test03/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>     MD5 (/test04/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>     MD5 (/test05/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>     MD5 (/test06/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>     MD5 (/test07/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
>>     write# zpool scrub test01 ; zpool scrub test02 ;zpool scrub test03
>>     ;zpool scrub test04 ; zpool scrub test05 ; zpool scrub test06 ;
>>     zpool scrub test07
>>     write# zpool status
>>        pool: test01
>>       state: ONLINE
>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>     10:27:49 2010
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test01      ONLINE       0     0     0
>>            da2       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test02
>>       state: ONLINE
>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>     10:27:52 2010
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test02      ONLINE       0     0     0
>>            da3       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test03
>>       state: ONLINE
>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>     10:27:54 2010
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test03      ONLINE       0     0     0
>>            da4       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test04
>>       state: ONLINE
>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>     10:27:57 2010
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test04      ONLINE       0     0     0
>>            da5       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test05
>>       state: ONLINE
>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>     10:28:00 2010
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test05      ONLINE       0     0     0
>>            da6       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test06
>>       state: ONLINE
>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>     10:28:02 2010
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test06      ONLINE       0     0     0
>>            da7       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>>        pool: test07
>>       state: ONLINE
>>       scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
>>     10:28:05 2010
>>     config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          test07      ONLINE       0     0     0
>>            da8       ONLINE       0     0     0
>>
>>     errors: No known data errors
>>
>> Based on these results, I've drawn the following conclusion:
>> * ZFS single pool per device = OKAY
>> * ZFS raidz of all devices = OKAY
>> * ZFS stripe of all devices = NOT OKAY
>>
>> The results are immediate, and I know ZFS will self-heal, so is that
>> what it is doing behind my back and just not reporting it? Is this a
>> ZFS bug with striping vs. raidz?
> Can you reproduce this problem using RELENG_8?  Please try one of the
> below snapshots.
>
> ftp://BLOCKEDftp4.freebsd.org/pub/FreeBSD/snapshots/201011/
>
The server is in a data center with limited access control, do I have to 
option of using a particular CVS tag (checking out via csup) and then 
perform a make world/kernel?

If so, I can report back later today, otherwise it might take longer :(

Mike C