graid3

Ivan Voras ivoras at freebsd.org
Sun Jul 27 00:55:51 UTC 2008


Wojciech Puchar wrote:
> i read the graid3 manual and http://www.acnc.com/04_01_03.html to make 
> sure i know what's RAID3 and i don't understand few things.
> 
> 1)
> 
> "The number of components must be equal to 3, 5, 9, 17, etc.
>                 (2^n + 1)."
> 
> why it can't be say 5 disks+parity?

The reason is in the definition on "RAID 3", which says the updates to 
the RAID device must be atomic. In some ideal universe, RAID 3 is 
implemented in hardware and on individual bytes, but here we cannot 
write to the drives in units other than sectorsize and sectorsize is 512 
bytes.

Parity needs to be calculated with regards to each sector, so at the 
sector level, the minimum number of sectors is three sectors: two for 
data and one for parity. This means the high-level atomic sectorsize is 
2*512=1024 bytes. If you inspect your RAID 3 devices, you'll see just that:

# diskinfo -v /dev/raid3/homes
/dev/raid3/homes
         1024            # sectorsize
         107374181376    # mediasize in bytes (100G)
         104857599       # mediasize in sectors

But each drive has a normal sectorsize of 512:

# diskinfo -v /dev/ad4
/dev/ad4
         512             # sectorsize
         80026361856     # mediasize in bytes (75G)
         156301488       # mediasize in sectors

Sector sizes cannot be arbitrary for various reasons, mostly dealing 
with how memory pages and virtual memory are managed. In short, they 
need to be powers of two. This restricts us to high-level ("big") sector 
sizes that can be exactly one of the following values: 1024, 2048, 4096, 
8192, etc. Since drive sectors are fixed to 512 bytes, this means that 
the number of *data* drives must also be a power of two: 2, 4, 8, 16, 
etc. Add one more drive for the parity and you get the starting 
sequence: 3, 5, 9, 17.

In practice, this means that if you have 17 drives in RAID3, the 
sectorsize of the array itself will be 16*512 = 8192. Each write to the 
array will update all 17 drives before returning (one sector on each 
drive, ensuring an atomic operation). Note that the file system created 
on such an array will also have its characteristics modified to the 
sector size (the fragment size will be the sector size).

> 2) "-r  Use parity component for reading in round-robin fashion.
> "Without this option the parity component is not used at
> all for reading operations when the device is in a complete state.
>  With this option specified random I/O read operations are even 40% faster
> , but sequential reads are slower.  One cannot use this option if the -w 
> option is also specified."
> 
> 
> how parity disk could speed up random I/O?

It will work well only when the number of drives is small (i.e. three 
drives), by using the parity drive as a valid source of data, avoiding 
some seeks to all drives. I think that, theoretically, you can save at 
most 0.33 (1/3) of all seeks - I don't know where the 40% number comes from.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20080727/c0969192/signature.pgp


More information about the freebsd-questions mailing list