Is there a "disconnected" state for geom_mirror providers?

Paul Mather paul at gromit.dlib.vt.edu
Sun Apr 24 10:49:58 PDT 2005


On Sun, 2005-04-24 at 19:04 +0200, Pawel Jakub Dawidek wrote:
> On Sun, Apr 24, 2005 at 12:31:53PM -0400, Paul Mather wrote:
> +> > If gmirror gets an error for READ or WRITE operation, it assumes provider
> +> > is broken. This is very important - if it will be marked only as stale,
> +> > it will be connected, resynchronization will start, but because there
> +> > was an error on provider, it probably will be disconnected again and we
> +> > have endless loop.
> +> 
> +> I guess it depends on what caused the disconnection in the first place.
> +> If it was a READ of a bad sector, it could be that subsequent
> +> resynchronisation will force a block reallocation of the bad block and
> +> the drive will no longer be "broken."
> 
> So you want me to count number of failures of every sector and mark
> component as broken if I've 2 failures related to the same sector or
> something like that?:)

No, I was just pointing out that the "endless loop" scenario you gave
might well not hold for certain common classes of read-induced failures.

> +> Thanks for the clarification.  That makes sense.  I just need to
> +> remember "gmirror forget" before I attempt to add back in the disk in my
> +> "TIMEOUT - WRITE_DMA" not-really-broken broken disk case. :-)
> 
> If reallocation happens here, there should be no I/O error visible for
> gmirror.

The "TIMEOUT - WRITE_DMA" failure does not involve any bad blocks; it's
a persistent problem with the ATA code (hopefully fixed recently:), not
the hardware.

> +> The shame about it being deleted from the mirror as opposed to marked as
> +> "broken" is you lose info (shown in "gmirror list") about the broken
> +> component priority, etc., which is useful for when you add a replacement
> +> device (or re-add the same one, as in my case).
> 
> You can use 'gmirror dump /dev/<your_component>'.

Thanks!  I guess I missed that in the man page.

> +> If you marked a component as "broken" (but still listed as part of the
> +> mirror), you could add a "-f" option to "gmirror rebuild" to force
> +> rebuilding onto it a la RAIDframe. :-)
> 
> This is not so simple. I don't store any info on broken component, that it
> is broken, because e.g. bad sector could be the sector with metadata.
> Other components are informed that something wrong is going on.
> How one can remove such broken component for good? Let's say you was able
> to read metadata from the component, but you cannot write there any more.
> How you can easily replace this component?

If "gmirror rebuild -f" was used, it would imply autosynchronisation was
turned off.  So, if you had real hardware problems with the provider, it
would remain broken because the rebuild would fail, too, with some
hardware error.  Eventually, as an operator, you'd get the hint that the
provider really had lasting problems, and replace it with something that
really worked. ;-)

> This complicates things a lot and I don't need more complications if I
> want gmirror to stay reliable (which I hope it is now).

I agree, as as there are ways around the issue via the gmirror command
set you've provided to us, it's probably wisest to keep things the way
they are.

Thanks for geom_mirror (and geom_stripe et al.)!

Cheers,

Paul.
-- 
e-mail: paul at gromit.dlib.vt.edu

"Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid."
        --- Frank Vincent Zappa


More information about the freebsd-geom mailing list