zfs on FreeBSD 8.2 64bit stuck in "One or more devices is currently being resilvered"

Mehmet Erol Sanliturk m.e.sanliturk at gmail.com
Sat Mar 21 16:57:52 UTC 2015


On Sat, Mar 21, 2015 at 9:01 AM, motty cruz <motty.cruz at gmail.com> wrote:

> Hi Mehmet, are you thinking a bad HDD bay? If I ran the gstat command I
> see is writing to disk :
> dT: 1.002s  w: 1.000s
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     0      0      0      0    0.0      0      0    0.0    0.0| acd0
>     0      9      0      0    0.0      9    144   22.1    3.1| mfid0
>     0      9      0      0    0.0      9    144   22.6    3.1| mfid0s1
>     0      9      0      0    0.0      9    144   22.9    3.2| mfid0s1a
>     0      0      0      0    0.0      0      0    0.0    0.0| mfid0s1b
>     0      0      0      0    0.0      0      0    0.0    0.0| mfid0s1d
>     0      0      0      0    0.0      0      0    0.0    0.0| mfid0s1e
>     0      0      0      0    0.0      0      0    0.0    0.0| mfid0s1f
>     2   4631   4631  13270    0.4      0      0    0.0   73.0| da0
>     0      0      0      0    0.0      0      0    0.0    0.0| da1
>     3   3979   3979  13345    0.7      0      0    0.0   78.0| da2
>     0      0      0      0    0.0      0      0    0.0    0.0| da3
>     5   4503   4503  13263    0.5      0      0    0.0   76.0| da4
>     5   4245   4245  13254    0.6      0      0    0.0   77.5| da5
>     4   4741      0      0    0.0   4741  11626    1.2   86.7| da6
>
> disk being replace is da6, as you can see w/s11626? unless I am not
> reading this right? so I don't think is the cable or port. I really don't
> know what is causing this issue:
>
> today is the 3rd day resilvering:
> # zpool status
>   pool: tank
>  state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scrub: resilver in progress for 47h47m, 100.00% done, 0h0m to go
> config:
>
>         NAME            STATE     READ WRITE CKSUM
>         tank            ONLINE       0     0     0
>           raidz2        ONLINE       0     0     0
>             label/019   ONLINE       0     0     0
>             label/001b  ONLINE       0     0     0
>             label/003   ONLINE       0     0     0
>             label/007b  ONLINE       0     0     0  1.79T resilvered
>             label/005   ONLINE       0     0     0
>             label/006   ONLINE       0     0     0
>             label/0171  ONLINE       0     0     0
> any suggestion on what should be my next step?
>
> Thanks in advance!
> -Motty
>
>


Yes , it may be .
If you can , you may attach to a working HDD bay and see whether the HDD
has problem or the HDD bay .

Another step may be to remove HDD from the trouble causing bay and use a
correctly working group of HDD bays .

Then add a HDD which you know is working correctly to a suspected HDD bay
and see whether it is causing trouble or not .

Continue in that way , up to identify  status of bays or its other related
components .


One important problem is corruption of your data . My suggestion is to back
up your data and up to resolving this issue , do not use this computer for
your production works .


Sometimes a part starts to failure step by step slowly and at the end may
completely fail .
I am saying these to emphasize the importance of saving of your data as
soon as possible .


If you have facility , another step may be to replace HDD bays controller
by a new and good quality controller .


Version 8.2 is very old .

Switching to a new version , either 9.3 , or 10.1 may be useful by using a
spare system to transfer your data to newly installed system .

I think you know very well how to migrate to a new system when ZFS is used .
I am not using ZFS , therefore , my knowledge is very weak .


I have encountered a likely similar problem in a NFS server - client group .
In the server , program sources were corrupted either by truncating lines
or by injecting invalid characters into lines , or changing characters to
invalid characters randomly .

I have replaced server , switch and cables and in suspected ( because of
"Access Violation" messages ) client computer the memory chips . At the end
it come out that the suspected client computer mother board chips is/are
faulty ( not memory chips ) or other parts .

When there is no any sufficiently capable testing equipment , only action
can be done is to replace suspected parts by other ( known to be working
parts as much as possible ) .





> On Fri, Mar 20, 2015 at 10:23 PM, Mehmet Erol Sanliturk <
> m.e.sanliturk at gmail.com> wrote:
>
>>
>>
>> On Fri, Mar 20, 2015 at 3:44 PM, Motty Cruz <motty.cruz at gmail.com> wrote:
>>
>>> Can you describe what you did to replace the disk?
>>>
>>> I sure can. I had spare hdd in the pool.
>>> #zpool replace tank label/004 label/007b
>>>
>>>             label/003             ONLINE       0     0     0
>>>             replacing             DEGRADED     0     0     0
>>>               433419809408607751  UNAVAIL      0     0     0
>>> was/dev/label/007
>>>               label/004           ONLINE       0     0     0  2.47T
>>> resilvered
>>>             label/005             ONLINE       0     0     0
>>>
>>> after two days of resilvering, the server became unresponsive. I reboot
>>> the server started to resilver again. after that I also
>>> detached bad disk.
>>> #zpool detach tank 433419809408607751
>>>
>>>


Since newly attached HDD is generating trouble , this may show that ,
problem is not in the HDD , but in the HDD bay or its related parts .
My suggestion is , "Do not salvage your disk before verifying that it is
really defective." .




> I have tried zpool clear tank but no success,
>>>
>>> Thanks,
>>> Motty
>>> On 03/20/2015 03:32 PM, Rainer Duffner wrote:
>>>
>>>> Am 20.03.2015 um 23:25 schrieb Motty Cruz <motty.cruz at gmail.com>:
>>>>>
>>>>> Hello Rainer,
>>>>>
>>>>> a disk went bad, I had to replace it, soon after replacing the bad HDD
>>>>> it started the "resilver" process. Process went on and on for hours,
>>>>> unfortunately server stop responding, I was force to reboot. after
>>>>> rebooting started "resilver" process again, from zero. I put the HDD
>>>>> offline replace it "thinking it was a factory bad HHD" started the
>>>>> "resilver" process again.
>>>>>
>>>>>
>>>> I would assume that the ZFS still thinks it’s the old disk somehow.
>>>> This is what usually happens then.
>>>>
>>>>
>>>> I’m not sure if an upgraded FreeBSD will help you with your
>>>> resilver-problem.
>>>>
>>>> Can you describe what you did to replace the disk?
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> freebsd-fs at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>>
>>
>>
>> Is there a possibility that the resilvered parts ( port , cable , etc. )
>> have hardware failure problems which OS is not able to complete resilvering
>> or it is seen that part to be resilvered ?
>>
>>
>>
>> Mehmet Erol Sanliturk
>>
>>
>>
>>
>>
>>
>
>
> --
> Thanks for your support,
> Motty
>


More information about the freebsd-fs mailing list