graid5 after-reboot problem

Pawel Jakub Dawidek pjd at FreeBSD.org
Sun May 6 13:01:21 UTC 2007


On Sun, May 06, 2007 at 11:54:45AM +0200, Dag-Erling Sm??rgrav wrote:
> Pawel Jakub Dawidek <pjd at FreeBSD.org> writes:
> > RAID3 is also write-hole safe, btw.
> 
> How?  Any write to a RAID3 requires writing the data to one of the
> data disks *and* updating the parity disk.

The "write hole" problem is so important in RAID5, because RAID5  parity
block to update data block. There are few stages of writting a block in
RAID5:

1. Read old content of the block you want to write.
2. Read corresponding parity block.
3. XOR parity with old content.
4. XOR parity with new content.
5. Write new content.
6. Write parity.

(This could be done by avoiding parity and reading all corresponding
data block, but it's way too inefficient, so this short-cut is most
popular.)

When you lose the power between 5 and 6, you parity will be corrupted
and will stay corrupted forever, because none of the further writes will
update it correctly (the only exception is when you do full stripe
write, then you don't read old parity, just calculate it, because you
have all data blocks needed).

This is so much different in RAID3. In RAID3 you always do full stripe
writes, so it looks like this:

1. Write data to all data disks and parity disk at once.

Of course 1 is not atomic, but when you have a power failure, graid3
will synchronize parity component, but even if you decide not to do it,
next write to this block will fix inconsistency, which is not the case
for RAID5. RAIDZ also does full stripe writes, just like RAID3, but its
COW model is what gives always consistent data and not full stripe
writes.

Also note, that using gjournal on top of graid3 will fix non-atomicity,
but gjournal on top of RAID5 won't fix RAID5 non-atomicity.

All in all, write hole is not that dangerous if you remember to
synchronize parity on unclean shutdown and this is need for RAID5,
RAID3, RAID1, RAID4, RAID6, etc. for RAID5 it is just most visible and
you can't avoid resynchronization even when you use things like
gjournal.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20070506/dd286fd8/attachment.pgp


More information about the freebsd-geom mailing list