vdev/pool math with combined raidzX vdevs...

Wed Jul 11 16:25:43 UTC 2012

On Jul 11, 2012, at 10:32 , Jason Usher wrote:

> Since (I think) a lot of raidz3 adoption is due to folks desiring "some overkill" as they attempt to overcome the "disks got really big but didn't get any faster (for rebuilds)"[1] ... but they are losing some of that by combining vdevs in a single pool.
> 
> Not losing so much that they're back down to the failure rate of a single raidz*2* vdev, but they're not at the overkill level they thought they were at either.
> 
> I think that's important, or at least worth noting...
> 
> 
> [1] http://storagegaga.com/4tb-disks-the-end-of-raid/

	That, and unrecoverable read errors (UREs) during reconstruction, are indeed the problem.  Gibson, et al, have gone on to object storage to get around this--RAID is done over the individual stored objects, rather than over the volume itself.  If you need to reconstruct, you can reconstruct both on-demand and lazily in the background (i.e., you start reconstructing the objects in a volume, and if a user attempts to access an as-yet-unreconstructed object, that object gets inserted at the head of the queue).

	There aren't, however, to my knowledge, any good-enough-to-use-at-work-without-hiring-a-pet-kernel-hacker object-based file systems available for free[1].  CMU PDL did raidframe, but that was a proof-of-concept and had not been bulletproofed and optimized (though many of the concepts there found their way into Panasas's PanFS).

	In the absence of a ready-to-go (or at least ready-to-assemble) object-based solution, ZFS is the next best thing.  You at least can get some warning from the parity scrub that objects are corrupted, and can have some duplicates lying around to recover.  That said, you're going to want to keep your failure domains fairly small, if you can, owing to the time-to-reconstruct and the inevitability of UREs[2] when volumes get large enough.

-- 
Chris BeHanna
chris at behanna.org

[1] Because it's very, very hard.  Panasas has been at it, full time, for more than ten years.  Spinnaker was at it for a long time, too, prior to the NetApp acquisition.  There's also Storage Tank and GFS, and there was Zambeel, and a few others.

[2] Garth Gibson talks about UREs on page 2:  http://gcn.com/articles/2008/07/25/garth-gibson--faster-storage-systems-through-parallelism.aspx