PERFORCE change 123662 for review

Fri Jul 20 12:35:27 UTC 2007

On tor, jul 19, 2007 at 11:01:20pm -0500, Eric Anderson wrote:
>  On 07/17/07 16:09, Ulf Lilleengen wrote:
> > http://perforce.freebsd.org/chv.cgi?CH=123662
> > Change 123662 by lulf at lulf_carrot on 2007/07/17 21:08:43
> > 	- Initial implementation of growing RAID-5 arrays. This is done by
> > 	  splitting the offset calculation into one for read and one for write
> > 	  operations. We make a distinction of subdisks that were added after
> > 	  the plex is not newborn any longer and subdisks that were added at
> > 	  creation/tasting time.  When a BIO write comes, the write will go to
> > 	  the whole plex, but read operations will only be done on subdisks that
> > 	  do not have the GV_SD_GROW flag set.  The bad thing with this is that
> > 	  we must ensure that new subdisks are added to a later plexoffset
> > 	  (which we should force, to make it easier for us, since there is not a
> > 	  good reason why the user should be able to set the plexoffset in this
> > 	  operation).  The implementation will probably change a bit.
> > 	- Add another state called RESIZING, and a flag called GV_PLEX_GROWING
> > 	  to indicate that a plex is in growing operation.
> > 	- Make sure obvious parts of the code respects this flag. Will need to
> > 	  look over this more though.
> 
> 
> 
>  Hi -
> 
>  So far, I'm very excited about your gvinum work - great work so far!
> 
>  I'm curious how you are growing a RAID5.  Can you describe this method a bit 
>  more?  Where did you see how to do this?
> 
> 
Hi,

Well, what I do is to attach/create the new subdisk as usual, but since it's a
RAID-5 array that I know is operational, I give the subdisk a flag, and sets the
plex in a resize state. Then, In the raid-5 code, I modify gv_raid5_offset
(which basically computes offsets within a subdisk based on the number of
subdisks and stripesize). However, what I do, is that instead of taking all
subdisks in the calculation, I only take those who does not have the GROW flag
(when reading), and I take all subdisks into calculation when it's a write.

This means, that if a create a gv_grow_plex function that reads (stripesize x
sdcount) bytes (from the subdisks that do not have the GROW flag), and writes
that data to the plex (including all subdisks). This way, i sort of overwrite
the old data, but the data is spread out over the new subdisks. I'm sorry if
this might seem a bit complex, but just ask more questions if you didn't
understand.

Actually, I didn't read this anywhere.. I sort of thought this out myself :P

-- 
Ulf Lilleengen