SV: RAID1 synchronisation - howto OR not necessary?

Mon Nov 26 02:58:21 PST 2007

Gert Lynge wrote:
>> The disks themselves handle the checksumming to detect bad blocks.
>> With modern disks it is *very* rare that a block on the disk goes bad
>> without the disk being able to report it it as such.  
>> This means that if you have a functioning RAID1 setup and one of the
>> disks report a bad block, then the controller can simply read the
>> corresponding block from the other disk, and rewrite it to the disk
>> with the bad block.  If a disk has problems writing a block it will
>> transparently re-map the block to another.
>> The problems can occur when one disk in a RAID-array has failed and you
>> try to rebuild it from the other disk(s). If you then encounter a bad block
>> on that disk you have a problem since you don't have a good copy of that
>> block.
>> This is what verification (which, btw, is not the same as synchronization)
>> tries to prevent by reading every block on each disk on a regular basis. 
>> Then the RAID controller can recover the data on any bad blocks from the
>> other disk(s) in the array.
> 
> I've been wondering how to do this with a BIOS assisted soft raid for some
> time.
> I have a server with ad4 ad6 in a mirror detected as ar0:
> ----
> ws# atacontrol status ar0
> ar0: ATA RAID1 subdisks: ad4 ad6 status: READY
> ----
> ws# cat /var/run/dmesg.boot
> [...]
> ar0: 76316MB <Intel MatrixRAID RAID1> status: READY
> ar0: disk0 READY (master) using ad4 at ata2-master
> ar0: disk1 READY (mirror) using ad6 at ata3-master
> [...]
> ----
> 
> ...and was wondering if dd could not do the job for me?
> ----
> ws# man dd
> [...]
> EXAMPLES
>      Check that a disk drive contains no bad blocks:
>            dd if=/dev/ad0 of=/dev/null bs=1m
> [...]
> ----
> 
> What if I run:
> dd if=/dev/ad4 /of=/dev/null bs=1m
> dd if=/dev/ad6 /of=/dev/null bs=1m
> 
> ...once a week - will that not verify that the two drives can read all
> blocks?
> 
> It would be nice to limit the load (the throughput of dd) though - anyone
> know if that is possible? Maybe by pipeing through a second command (I guess
> a throughput limiter could easily be programmed?).

Hi,

For achieving this, I use smartmontools and program smartd to regularly issue an
„offline test” to the drive. I receive a mail if any bad sector is found.

The good thing is that this verification happens in the drive itself and reading
/ writing from the drive will automatically suspend the test. This gives the
feeling that the test is done without any performance penality.

The bad thing is that this verification happens in the drive itself. If the
drive has a faulty firmware[1], or if other errors (such as problems with IDE
cables occur), these won't be detected.

All in all, smartd + geom_mirror gives me more confidence that I won't lose data.