ZfS & GEOM with many odd drive sizes

Thu Jul 26 07:29:43 UTC 2007

On Thu, 2007-07-26 at 07:59 +0100, Mark Powell wrote:
> On Wed, 25 Jul 2007, Doug Rabson wrote:
> > On Wed, 2007-07-25 at 10:47 -0700, Bakul Shah wrote:
> >> Does it really do this?  As I understood it, only one of the
> >> disks in a mirror will be read for a given block.  If the
> >> checksum fails, the same block from the other disk is read
> >> and checksummed.  If all the disks in a mirror are read for
> >> every block, ZFS read performance would get somewhat worse
> >> instead of linear scaling up with more disks in a mirror.  In
> >> order to monitor data on both disks one would need to
> >> periodically run "zpool scrub", no?  But that is not
> >> *continuous* monitoring of the two sides.
> >
> > This is of course correct. I should have said "continuously checks the
> > data which you are actually looking at on a regular basis". The
> > consistency check is via the block checksum (not comparing the date from
> > the two sides of the mirror).
> 
> ACcording to this:
> 
> http://www.opensolaris.org/jive/thread.jspa?threadID=23093&tstart=0
> 
> RAID-Z has to read every drive to be able to checksum a block.
>    Isn't this the reason why RAID-Z random reads are so slow and also the 
> reason the pre-fetcher exists to speed up sequential reads?
>    Cheers.

When its reading, RAID-Z only has to read the blocks which contain data
- the parity block is only read if either the vdev is in degraded mode
after a drive failure or one (two for RAID-Z2) of the data block reads
fails.

For pools which contain a single RAID-Z or RAID-Z2 group, this is
probably a performance issue. Larger pools containing multiple RAID-Z
groups can spread the load to improve this.