ZFS - Unable to offline drive in raidz1 based pool

Kurt Touet ktouet at gmail.com
Mon Sep 21 17:44:29 UTC 2009


Apparently you were right Aaron:

monolith# zpool scrub storage
monolith# zpool status storage
  pool: storage
 state: ONLINE
 scrub: resilver completed after 0h1m with 0 errors on Mon Sep 21 11:37:24 2009
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad14    ONLINE       0     0     0  1.46M resilvered
            ad6     ONLINE       0     0     0  2K resilvered
            ad12    ONLINE       0     0     0  3K resilvered
            ad4     ONLINE       0     0     0  3K resilvered

errors: No known data errors
monolith# zpool offline storage ad6
monolith# zpool online storage ad6
monolith# zpool status storage
  pool: storage
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Mon Sep 21 11:40:12 2009
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad14    ONLINE       0     0     0  67.5K resilvered
            ad6     ONLINE       0     0     0  671K resilvered
            ad12    ONLINE       0     0     0  67.5K resilvered
            ad4     ONLINE       0     0     0  53K resilvered

errors: No known data errors


I wonder then, with the storage array reporting itself as healthy, how
did it know that one drive had desynced data, and why wouldn't that
have shown up as an error like DEGRADED?

Cheers,
-kurt


On Mon, Sep 21, 2009 at 11:21 AM, Kurt Touet <ktouet at gmail.com> wrote:
> I thought about that possibility as well.. but I had scrubbed the
> array within 10 days. I'll give it a shot again today and see if that
> brings up any other errors (or allows me to offline the drive
> afterwards).
>
> Cheers,
> -kurt
>
> On Mon, Sep 21, 2009 at 4:39 AM, Aaron Hurt <aaron at goflexitllc.com> wrote:
>> Kurt Touet wrote:
>>>
>>> I am using ZFS pool based on a 4-drive raidz1 setup for storage.  I
>>> believe that one of the drives is failing, and I'd like to
>>> remove/replace it.  The drive has been causing some issues (such as
>>> becoming non-responsive and hanging the system with timeouts), so I'd
>>> like to offline it, and then run in degraded mode until I can grab a
>>> new drive (tomorrow).  However, when I disconnected the drive (pulled
>>> the plug, not using a zpool offline command), the following occurred:
>>>
>>>        NAME        STATE     READ WRITE CKSUM
>>>        storage     FAULTED       0     0     1
>>>          raidz1    DEGRADED     0     0     0
>>>            ad14    ONLINE       0     0     0
>>>            ad6     UNAVAIL      0     0     0
>>>            ad12    ONLINE       0     0     0
>>>            ad4     ONLINE       0     0     0
>>>
>>> Note: That's my recreation of the output... not the actual text.
>>>
>>> At this point, I was unable to to do anything with the pool... and all
>>> data was inaccessible.  Fortunately, the after sitting pulled for a
>>> bit, I tried putting the failing drive back into the array, and it
>>> booted properly.  Of course, I still want to replace it, but this is
>>> what happens when I try to take it offline:
>>>
>>> monolith# zpool status storage
>>>  pool: storage
>>>  state: ONLINE
>>>  scrub: none requested
>>> config:
>>>
>>>        NAME        STATE     READ WRITE CKSUM
>>>        storage     ONLINE       0     0     0
>>>          raidz1    ONLINE       0     0     0
>>>            ad14    ONLINE       0     0     0
>>>            ad6     ONLINE       0     0     0
>>>            ad12    ONLINE       0     0     0
>>>            ad4     ONLINE       0     0     0
>>>
>>> errors: No known data errors
>>> monolith# zpool offline storage ad6
>>> cannot offline ad6: no valid replicas
>>> monolith# uname -a
>>> FreeBSD monolith 8.0-RC1 FreeBSD 8.0-RC1 #2 r197370: Sun Sep 20
>>> 15:32:08 CST 2009     k at monolith:/usr/obj/usr/src/sys/MONOLITH  amd64
>>>
>>> If the array is online and healthy, why can't I simply offline a drive
>>> and then replace it afterwards?  Any thoughts?   Also, how does a
>>> degraded raidz1 array end up faulting the entire pool?
>>>
>>> Thanks,
>>> -kurt
>>> _______________________________________________
>>> freebsd-fs at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>>>
>>> !DSPAM:2,4ab6ac55126167777521459!
>>>
>>>
>>
>> I'm not sure why it would be giving you that message.  In a raidz1 you
>> should be able to sustain one failure.  The only thing that comes to mind
>> this early in the morning would be that somehow your data replication across
>> your discs isn't totally in sync.  I would suggest you try a scrub and then
>> see if you can remove the drive afterwards.
>>
>> Aaron Hurt
>> Managing Partner
>> Flex I.T., LLC
>> 611 Commerce Street
>> Suite 3117
>> Nashville, TN  37203
>> Phone: 615.438.7101
>> E-mail: aaron at goflexitllc.com
>>
>>
>


More information about the freebsd-fs mailing list