gmirror or ata problem

Eric Anderson anderson at freebsd.org
Thu Feb 1 13:55:14 UTC 2007


On 01/31/07 17:12, Fluffles wrote:
> Pawel Jakub Dawidek wrote:
>> On Wed, Jan 31, 2007 at 09:12:02PM +0100, Simon L. Nielsen wrote:
>>   
>>> On 2007.01.30 09:51:14 +0100, Oliver Fromme wrote:
>>>
>>>     
>>>> This is strange.  gmirror just detached one of its disks
>>>> for no apparent reason.  I've built a mirror consisting of
>>>> the components ad0 and ad1 (both SATA drives).  It has
>>>> been running fine.  This is RELENG_6 from 2006-12-20.
>>>>
>>>> Yesterday evening ad1 was detached.  There is no other
>>>> error message logged on console or in the logs (i.e. no
>>>> I/O error such as a bad sector or anything).  There was
>>>> no particularly high load at that time.  In fact, the
>>>> machine had been under much higher load before, without
>>>> anything bad happening.
>>>>
>>>> This is from the logs:
>>>>
>>>> Jan 29 19:10:13 pluto -- MARK --
>>>> Jan 29 19:20:26 pluto kernel: ad1: FAILURE - device detached
>>>> Jan 29 19:20:26 pluto kernel: subdisk1: detached
>>>> Jan 29 19:20:26 pluto kernel: ad1: detached
>>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot write metadata on ad1 (device=gm0, error=6).
>>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6).
>>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6).
>>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Device gm0: provider ad1 disconnected.
>>>> Jan 29 19:50:13 pluto -- MARK --
>>>>       
>>> I have seen similar problems on my graid3.  I think it's simply the
>>> disk which stops responding to commands, or at least ata(4) can't talk
>>> to the disk anymore...
>>>
>>> I see it on:
>>>
>>> ad10: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata5-master SATA150
>>> ad12: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata6-master SATA150
>>> ad14: 305245MB <WDC WD3200YS-01PGB0 21.00M21> at ata7-master SATA150
>>>
>>> After a reboot everything seems fine again and my RAID is rebuilt.
>>>
>>> I don't know why it happens, but it sucks :-/.  I'm running 7-CURRENT
>>> BTW.
>>>     
>> It seems that when gmirror/graid3 writes to more than one disk at a
>> time, this puts too much load on ata channel or something and ata
>> disconnects the disk. I don't really know how it works exactly, but
>> maybe some timeout should be increased in the ata code?
>>   
> 
> My experiences are that even a single disk will timeout; 5 seconds is
> just not enough for the disk to spinup. Most disks will need 10 seconds
> at least.
> In ata-disk.c the timeout is set at 5 seconds. When set at 15 seconds;
> the ataidle-sleep mode works perfectly. I think this should be patched.
> Right now ataidle is broken on FreeBSD i would say, without patching the
> sourcecode at least.
> 
> For those not being able to wait for an official patch; try this:
> - edit /usr/src/sys/dev/ata/ata-disk.c
> - search for "timeout" case-insensitive
> - you will find:     request->timeout = 5;
> - change the value 5 to 15
> - save and execute: cd /usr/src; make kernel KERNCONF=GENERIC
> - after reboot you can test ataidle and it should work perfectly; with
> any geom raid layer or as 'single disk'

Is there any reason the sleep and idle pieces of ataidle could not be 
added to atacontrol?


Eric



More information about the freebsd-geom mailing list