lazy mirror / live backup

Mike Wolman mike at nux.co.uk
Fri Apr 20 13:22:17 UTC 2007


Hi List,

I'd like the ability to have gmirror do a more efficient re-silvering (or 
re-syncing) of the mirror members when a planned disconnect occurs. This 
would significantly reduce the mirror rebuild time for any component which 
had been deactivated, for network mirrors using ggated devices this would 
also reduce network usage and could be used for remote asynced mirrors 
thus providing a live backup for laptops/workstations.


Main Points

When a normal mirror breaks this module must keep track of which block in 
the mirror have changed.

- This can be done by keeping a list/map of just the block which change.

- This list/map needs to be stored on a device not provided by the mirror 
or in memory. If this list is stored in memory on rebooting the machine any 
deattached drive would require a full resync so a way of saving dumping 
this list to permanent storage would be required as this would be a 
problem for large mirrors over slow links.


Example uses:

Usb/Firewire external drive nightly full backup.

If a mirror is contracted with 3 components: ad1, umass1 and umass2

umass1 and umass2 are backup devices taken home on alternate nights by 
different users, always allowing for a device to have a full 1 day old 
backup at a remote location.

This module should be able to use the change log for multiple devices 
preserving the changes until all components are upto date.  Should one of 
the usb devices fail and is removed from the mirror the change log should 
be cleared (provided all other components are upto date) allowing for 
drive failers and stopping the block change log growing indefinitely.  It 
should be possible to use the same change log for more than one device.

Normal full backups to usb devices can take many hours, this should reduce 
the time to only the amount of data added within the period the device was 
last attached to the mirror.


Example use for disaster recovery - slow links:

If the mirror consist of 2 components ad1 and ggatec1 with component 
ggatec1 being on a slow link.

A flush period tuneable could be used by deactivating the ggatec component 
and reactivating it allowing for an asynchronous mirror - á la rsync but 
faster as there is no file list etc.

A tuneable may be required to only sync blocks which have not been changed 
in xx seconds/minutes to prevent the same blocks being transferred too 
often.

A tuneable to specify the speed at which gmirror syncronises the out of 
sync component will be required - This would possibly be useful for normal 
gmirror use on a busy server when rebuilding a drive, as gmirror currently 
uses all available  write speed to do so - limiting rebuild speed may 
therefore prevent drive failures.


Live backup of laptop/workstation

If the mirror is created using a local disk ad0 and a ggated mounted 
device ggate0 with a balance algorithm preferring ad0.

When the network is unavailable gmirror starts to keep track of the 
changed blocks. On reconnection to the network and activating the ggatec 
component the list of changed blocks can be flushed.  Should the same 
block have changed more than once only the last change needs to be sent - 
reducing network usage.

The main problem with mirroring the whole system drive is that any swap 
changes will need to be ignored.


Other Considerations/Suggestions:

- Gmirrror will need somehow need to be informed that a drive has actually 
failed and is not just temporarily disconnected.

- Data structure consideration difference between a list of block numbers 
that have changed, and a block bitmap.  A block bitmap is perfect for 
this, and only requires 1bit per block of storage, max.  No more.  A list 
of block numbers can get *HUGE* though, because the block numbers are 
probably all 64bit numbers, so it will be 64x the amount of space required 
to store the list, not to mention the issues of sorting and maintaining it

- For determining the size of this 'block change map', you could use the 
ceiling of the max number of blocks.  so, a 100Gb storage mirror, would 
have roughly 200000000 512b blocks.  So, 200million bits (using a bitmap 
to store when a block needs resyncing or not (0 no sync, 1 sync) is 
roughly 24MB. You could pretty easily keep that in memory, but if the size 
was 1Tb, you'd be at around 240MB, so that starts to get a little much. 
Since this would be able to be enabled/disabled, it may not be an issue.

- Possibly, you could cheat.  Instead of marking each storage block 
(512byte sector) as needing sync or not, you could do it in 16KB chunks. 
So, if any sector inside that 16KB chunk was written, resync the whole 
chunk.  That reduces your memory footprint for a 100GB mirror down to 
something less than 1MB! That means a 1Tb mirror would need only 7-10MB. 
You'll resync a little extra data, but since drives cache and the GEOM 
layer does requests efficiently in larger sizes anyhow, this might 
actually perform better anyway.


If there is anyone has further suggestions for this idea please let me 
know and if there are and developers interested in this i may be able to 
provide/donate some hardware - sorry not new - a laptop, desktop and some 
hard drives - and can setup a machine for any network related testing.



More information about the freebsd-geom mailing list