zfs mirror recognizing disk failures

Michael Boers michaelscotttech at gmail.com
Tue Nov 16 13:32:39 UTC 2010


On Nov 16, 2010, at 5:24 AM, Olivier Smedts wrote:

> 2010/11/15 Michael Boers <michaelscotttech at gmail.com>:
>> Is there anything I can do to make a zfs mirror quicker to give up  
>> on a
>> flaky disk?
>>
>> I recently had a 100% zfs system crash when started to have some disk
>> errors.  I had hoped that by having a mirror, the system would  
>> survive this
>> type of error.  Instead it just hung.
>
> You can offline the faulty drive.
> Also, I think you're interested in a feature like TLER :
> http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery
> But typical (cheap) drives don't implement it.


Unfortunately, I was not able to offline the drive.  I was not able to  
gain access to the machine.  It responded to pings and since it is a  
CARP master, it was still broadcasting its "masterness", but any  
attempt to ssh into the machine failed.  It is my guess that anything  
disk related was blocked behind the problem.

To answer Jermey's question of "what happened next?"

The machine was not serving web requests
The machine was not responsive via ssh
The machine was pingable

after waiting about 15 minutes, I used the ipmi protocol to power down  
the machine.
When it came back up, I found the enclosed errors in the log.

If I am following your comments correctly, the fault for this lies in  
the mpt system not giving up which could either be a driver or a  
firmware issue.  Is that correct?

How do I protect against that?


>
>>
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SYNCHRONIZE  
>> CACHE(10).
>> CDB: 35 0 0 0 0 0 0 0 0 0
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): CAM Status: SCSI  
>> Status
>> Error
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SCSI Status: Check
>> Condition
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): ABORTED COMMAND  
>> asc:0,0
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): No additional sense
>> information
>> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): Retries Exhausted
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003c87a0:2838 timed
>> out for ccb 0xffffff0103acc000 (req->ccb 0xffffff0103acc000)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003c5110:2839 timed
>> out for ccb 0xffffff035cab0800 (req->ccb 0xffffff035cab0800)
>> Nov 11 10:05:53 caprica kernel: mpt0: attempting to abort req
>> 0xffffff80003c87a0:2838 function 0
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003bef30:2840 timed
>> out for ccb 0xffffff0007986800 (req->ccb 0xffffff0007986800)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003c8560:2841 timed
>> out for ccb 0xffffff032d985000 (req->ccb 0xffffff032d985000)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003bf320:2842 timed
>> out for ccb 0xffffff0103af2000 (req->ccb 0xffffff0103af2000)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003cbda0:2843 timed
>> out for ccb 0xffffff0103b0b000 (req->ccb 0xffffff0103b0b000)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003bfd40:2844 timed
>> out for ccb 0xffffff00102bf800 (req->ccb 0xffffff00102bf800)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003cad50:2845 timed
>> out for ccb 0xffffff01e6f33000 (req->ccb 0xffffff01e6f33000)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003caf00:2846 timed
>> out for ccb 0xffffff01e6f24800 (req->ccb 0xffffff01e6f24800)
>> Nov 11 10:05:53 caprica kernel: mpt0: request  
>> 0xffffff80003ccd60:2847 timed
>> out for ccb 0xffffff01308a4000 (req->ccb 0xffffff01308a4000)
>>
>> Is this a type of error zfs can survive or do I need a hardware  
>> mirror to
>> handle this type of problem?
>>
>> Thanks,
>>
>> _______________________________________________
>> freebsd-fs at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>>
>
>
>
> -- 
> Olivier Smedts                                                 _
>                                         ASCII ribbon campaign ( )
> e-mail: olivier at gid0.org        - against HTML email & vCards  X
> www: http://www.gid0.org    - against proprietary attachments / \
>
>   "Il y a seulement 10 sortes de gens dans le monde :
>   ceux qui comprennent le binaire,
>   et ceux qui ne le comprennent pas."



More information about the freebsd-fs mailing list