gmirror robustness

perryh at pluto.rain.com perryh at pluto.rain.com
Sat Jun 25 21:13:25 UTC 2011


How would I go about making gmirror more robust WRT transient errors?

Once in a while I get a sequence like this (reformatted):

Jun 25 15:55:30 fbsd81 kernel: ad8: WARNING - WRITE_DMA48 UDMA
                ICRC error (retrying request) LBA=615769530
Jun 25 15:55:30 fbsd81 kernel: ad8: FAILURE - WRITE_DMA48
                status=51<READY,DSC,ERROR> error=4<ABORTED>
                LBA=615769530
Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Request failed (error=5).
                ad8s2a[WRITE(offset=315265765888, length=78336)]
Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Device gm0: provider
                ad8s2a disconnected.

It's always the same 4 messages:  a retried WRITE_DMA48 UDMA ICRC
error, a WRITE_DMA48 "FAILURE" on the same LBA with status=51 and
error=4, a gmirror "Request failed (error=5)", and a disconnect.
The LBA, offset, and length vary from one instance to another.

It's unclear why the ad8 driver is returning an error indication
after a single retryable error -- I'll be asking about that on
drivers@ -- but the question here is how to improve gmirror's
handling of the situation.  I'd prefer to have gmirror retry before
giving up and disconnecting, or at least deactivate instead of
disconnecting (so that I can reactivate, and have it update the
mirror, rather than having to re-insert the disconnected provider
and have gmirror spend the next couple of hours recopying everything).

Are there any configuration settings that would affect this behavior,
or would I have to hack the code?


More information about the freebsd-geom mailing list