8.1-RELEASE: ZFS data errors

Mike Carlson carlson39 at llnl.gov
Tue Nov 9 01:05:15 UTC 2010


On 11/08/2010 11:29 AM, Jeremy Chadwick wrote:
> On Mon, Nov 08, 2010 at 11:11:31AM -0800, Mike Carlson wrote:
>> On 11/08/2010 11:06 AM, Jeremy Chadwick wrote:
>>> On Mon, Nov 08, 2010 at 10:32:56AM -0800, Mike Carlson wrote:
>>>> I'm having a problem with  stripping 7 18TB RAID6 (hardware SAN)
>>>> volumes together.
>>>>
>>>> Here is a quick rundown of the hardware:
>>>> * HP DL180 G6 w/12GB ram
>>>> * QLogic FC HBA (Qlogic ISP 2532 PCI FC-AL Adapter)
>>>> * Winchester Hardware SAN,
>>>>
>>>>     da2 at isp0 bus 0 scbus2 target 0 lun 0
>>>>     da2:<WINSYS SX2318R 373O>   Fixed Direct Access SCSI-5 device
>>>>     da2: 800.000MB/s transfers
>>>>     da2: Command Queueing enabled
>>>>     da2: 19074680MB (39064944640 512 byte sectors: 255H 63S/T 2431680C)
>>>>
>>>
>> The server is in a data center with limited access control, do I
>> have to option of using a particular CVS tag (checking out via csup)
>> and then perform a make world/kernel?
> Doing this is more painful than, say, downloading a livefs image and
> seeing if you can reproduce the problem (e.g. you won't be modifying
> your existing OS installation), especially since I can't guarantee that
> the problem you're seeing is fixed in RELENG_8 (hence my request to
> begin with).  But if you can't boot livefs, then here you go:
>
> You'll need some form of console access (either serial or VGA) to do the
> upgrade reliably.  "Rolling back" may also not be an option since
> RELENG_8 is newer than RELENG_8_1 and may have introduced some new
> binaries or executables into the fray.  If you don't have console access
> to this machine, if things go awry you may be SOL.  The vagueness of my
> statement is intentional; I can't cover every situation that might come
> to light.
>
> Please be sure to back up your kernel configuration file before doing
> the following, and make sure that the supfile shown below has
> tag=RELENG_8 in it (it should).  And yes, the rm commands below are
> recommended; failure to use them could result in some oddities given
> that your /usr/src tree refers to RELENG_8_1 version numbers which
> differ from RELENG_8.  You *do not* have to do this for ports (since for
> ports, tag=. is used by default).
>
> rm -fr /var/db/sup/src-all
> rm -fr /usr/src/*
> rm -fr /usr/obj/*
> csup -h cvsupserver -L 2 /usr/share/examples/cvsup/stable-supfile
>
> At this point you can restore your kernel configuration file to the
> appropriate place (/sys/i386/conf, /sys/amd64/conf, etc.) and build
> world/kernel as per the instructions in /usr/src/Makefile (see lines
> ~51-62).  ***Please do not skip any of the steps***.  Good luck.
>
> --
> | Jeremy Chadwick                                   jdc at parodius.com |
> | Parodius Networking                       http://BLOCKEDwww.BLOCKEDparodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.              PGP: 4BD6C0CB |
>
>
>
I wasn't able to make it to the Data Center to boot off of a USB/CD, but 
I did follow your steps to upgrade to RELENG_8. So far, things are stable:

    write# uname -a
    FreeBSD write.llnl.gov 8.1-STABLE FreeBSD 8.1-STABLE #0: Mon Nov  8
    16:38:06 PST 2010    
    root at write.llnl.gov:/usr/obj/usr/src/sys/GENERIC  amd64
    write# kldstat
    Id Refs Address            Size     Name
      1   15 0xffffffff80100000 d86d18   kernel
      2    1 0xffffffff80e87000 f058     aio.ko
      3    1 0xffffffff80e97000 16ea40   ispfw.ko
      4    1 0xffffffff81006000 5568     geom_multipath.ko
      5    1 0xffffffff81222000 104ac5   zfs.ko
      6    1 0xffffffff81327000 1a15     opensolaris.ko
    write# zpool create test01 da2 da3 da4 da5 da6 da7 da8
    write# zpool status
    write# cd /tmp
    write# clear
    write# cp random.dat.1 /test01/
    write# cp random.dat.1 /test01/random.dat.2
    write# cp random.dat.1 /test01/random.dat.3
    write# cp random.dat.1 /test01/random.dat.4
    write# cp random.dat.1 /test01/random.dat.5
    write# cp random.dat.1 /test01/random.dat.6
    write# md5 random.dat.1
    MD5 (random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
    write# md5 /test01/random.dat.*
    MD5 (/test01/random.dat.1) = f795fa09e1b0975c0da0ec6e49544a36
    MD5 (/test01/random.dat.2) = f795fa09e1b0975c0da0ec6e49544a36
    MD5 (/test01/random.dat.3) = f795fa09e1b0975c0da0ec6e49544a36
    MD5 (/test01/random.dat.4) = f795fa09e1b0975c0da0ec6e49544a36
    MD5 (/test01/random.dat.5) = f795fa09e1b0975c0da0ec6e49544a36
    MD5 (/test01/random.dat.6) = f795fa09e1b0975c0da0ec6e49544a36
    write# zpool status
       pool: test01
      state: ONLINE
      scrub: none requested
    config:

         NAME        STATE     READ WRITE CKSUM
         test01      ONLINE       0     0     0
           da2       ONLINE       0     0     0
           da3       ONLINE       0     0     0
           da4       ONLINE       0     0     0
           da5       ONLINE       0     0     0
           da6       ONLINE       0     0     0
           da7       ONLINE       0     0     0
           da8       ONLINE       0     0     0

    errors: No known data errors
    write# zpool scrub test01
    write# zpool status
       pool: test01
      state: ONLINE
      scrub: scrub completed after 0h0m with 0 errors on Mon Nov  8
    17:00:01 2010
    config:

         NAME        STATE     READ WRITE CKSUM
         test01      ONLINE       0     0     0
           da2       ONLINE       0     0     0
           da3       ONLINE       0     0     0
           da4       ONLINE       0     0     0
           da5       ONLINE       0     0     0
           da6       ONLINE       0     0     0
           da7       ONLINE       0     0     0
           da8       ONLINE       0     0     0

    errors: No known data errors


Any ideas for further testing to narrow down the culprit? Oh, one other 
thing that I modified was /boot/loader.conf. I had previously limited 
the vfs.zfs.arc_max to 1024M, so I had also commented that out.

Thanks again, I'm going to continue writing files and scrubbing the 
array until I have a level of confidence with the file system.

Mike C



More information about the freebsd-fs mailing list