svn commit: r216984 - projects/graid/head/sys/geom/raid

Wed Jan 5 12:41:45 UTC 2011

On 05.01.2011 10:39, Pawel Jakub Dawidek wrote:
> On Wed, Jan 05, 2011 at 12:19:40AM +0000, Warner Losh wrote:
>> Author: imp
>> Date: Wed Jan  5 00:19:40 2011
>> New Revision: 216984
>> URL: http://svn.freebsd.org/changeset/base/216984
>>
>> Log:
>>    First pass at error recovery: if the first disk that we get errors on
>>    has a problem, try from the second one.  Note info about possible bad
>>    sector remap attempt through write, and some ideas on when to eject
>>    the subdisk from the disk.
>
> My ideas what to do on I/O error mostly matches yours:
> - On read error, read from the other disk, write the data back to the
>    first disk.  Before you return the data up, you must wait for write to
>    complete.  If you won't wait, you can lose race with new write request
>    going into the same area and you will overwrite new data with the old
>    one.

In design document we have planned range locking mechanism for use here 
and during synchronization/rebuild.

> - Count read errors and mark disk as broken after some number of errors.
>    If you get I/O errors because your requests time out you really want
>    to disconnect the misbehaving disk or your entire array would suffer
>    (read from the first disk, wait for timeout, read from the second
>    disk).

It is planned.

> - On write error you want to mark disk as broken immediately, as from
>    now on it has stale data and can't be trusted.

Right. As further steps we have discussed idea of keeping such disks as 
part of array, marking them as dirty, avoiding reads from them. If main 
disk instrantly fail, partially broken disk is probably better then nothing.

> How do you plan to detect if there was unclean shutdown and you need to
> synchronize the disks?

It depends from metadata format. Intel metadata, according to Linux 
sources, seem to have some flags related to the case. I have planned to 
implement logic used by gmirror (dirty on first write and clean on close 
or after timeout) using that flags and metadata sequence numbers.

> Do you plan to support some kind of dirty bitmap to be able to optimize
> synchronization time after unclean shutdown? If you do, you might want
> to look at HAST. I implemented dirty bitmap handling based on DRBD
> ideas, which gives the lowest overhead I can think of.

I've thought about it, but it depends on metadata formats. At this 
moment I don't know such ones.

-- 
Alexander Motin