gmirror or ata problem

Pawel Jakub Dawidek pjd at FreeBSD.org
Wed Jan 31 22:01:02 UTC 2007


On Wed, Jan 31, 2007 at 09:12:02PM +0100, Simon L. Nielsen wrote:
> On 2007.01.30 09:51:14 +0100, Oliver Fromme wrote:
> 
> > This is strange.  gmirror just detached one of its disks
> > for no apparent reason.  I've built a mirror consisting of
> > the components ad0 and ad1 (both SATA drives).  It has
> > been running fine.  This is RELENG_6 from 2006-12-20.
> > 
> > Yesterday evening ad1 was detached.  There is no other
> > error message logged on console or in the logs (i.e. no
> > I/O error such as a bad sector or anything).  There was
> > no particularly high load at that time.  In fact, the
> > machine had been under much higher load before, without
> > anything bad happening.
> > 
> > This is from the logs:
> > 
> > Jan 29 19:10:13 pluto -- MARK --
> > Jan 29 19:20:26 pluto kernel: ad1: FAILURE - device detached
> > Jan 29 19:20:26 pluto kernel: subdisk1: detached
> > Jan 29 19:20:26 pluto kernel: ad1: detached
> > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot write metadata on ad1 (device=gm0, error=6).
> > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6).
> > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6).
> > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Device gm0: provider ad1 disconnected.
> > Jan 29 19:50:13 pluto -- MARK --
> 
> I have seen similar problems on my graid3.  I think it's simply the
> disk which stops responding to commands, or at least ata(4) can't talk
> to the disk anymore...
> 
> I see it on:
> 
> ad10: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata5-master SATA150
> ad12: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata6-master SATA150
> ad14: 305245MB <WDC WD3200YS-01PGB0 21.00M21> at ata7-master SATA150
> 
> After a reboot everything seems fine again and my RAID is rebuilt.
> 
> I don't know why it happens, but it sucks :-/.  I'm running 7-CURRENT
> BTW.

It seems that when gmirror/graid3 writes to more than one disk at a
time, this puts too much load on ata channel or something and ata
disconnects the disk. I don't really know how it works exactly, but
maybe some timeout should be increased in the ata code?

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20070131/065ea022/attachment.pgp


More information about the freebsd-geom mailing list