lazy mirror / live backup
Mike Wolman
mike at nux.co.uk
Fri Apr 20 12:44:06 UTC 2007
Hi List,
I'd like the ability to have gmirror do a more efficient re-silvering (or
re-syncing) of the mirror members when a planned disconnect occurs. This
would significantly reduce the mirror rebuild time for any component which
had been deactivated, for network mirrors using ggated devices this would
also reduce network usage and could be used for remote asynced mirrors
thus providing a live backup for laptops/workstations.
Main Points
When a normal mirror breaks this module must keep track of which block in
the mirror have changed.
- This can be done by keeping a list/map of just the block which change.
- This list/map needs to be stored on a device not provided by the mirror
or in memory. If this list is stored in memory on rebooting the machine any
deattached drive would require a full resync so a way of saving dumping
this list to permanent storage would be required as this would be a
problem for large mirrors over slow links.
Example uses:
Usb/Firewire external drive nightly full backup.
If a mirror is contracted with 3 components: ad1, umass1 and umass2
umass1 and umass2 are backup devices taken home on alternate nights by
different users, always allowing for a device to have a full 1 day old
backup at a remote location.
This module should be able to use the change log for multiple devices
preserving the changes until all components are upto date. Should one of
the usb devices fail and is removed from the mirror the change log should
be cleared (provided all other components are upto date) allowing for
drive failers and stopping the block change log growing indefinitely. It
should be possible to use the same change log for more than one device.
Normal full backups to usb devices can take many hours, this should reduce
the time to only the amount of data added within the period the device was
last attached to the mirror.
Example use for disaster recovery - slow links:
If the mirror consist of 2 components ad1 and ggatec1 with component
ggatec1 being on a slow link.
A flush period tuneable could be used by deactivating the ggatec component
and reactivating it allowing for an asynchronous mirror - á la rsync but
faster as there is no file list etc.
A tuneable may be required to only sync blocks which have not been changed
in xx seconds/minutes to prevent the same blocks being transferred too
often.
A tuneable to specify the speed at which gmirror syncronises the out of
sync component will be required - This would possibly be useful for normal
gmirror use on a busy server when rebuilding a drive, as gmirror currently
uses all available write speed to do so - limiting rebuild speed may
therefore prevent drive failures.
Live backup of laptop/workstation
If the mirror is created using a local disk ad0 and a ggated mounted
device ggate0 with a balance algorithm preferring ad0.
When the network is unavailable gmirror starts to keep track of the
changed blocks. On reconnection to the network and activating the ggatec
component the list of changed blocks can be flushed. Should the same
block have changed more than once only the last change needs to be sent -
reducing network usage.
The main problem with mirroring the whole system drive is that any swap
changes will need to be ignored.
Other Considerations/Suggestions:
- Gmirrror will need somehow need to be informed that a drive has actually
failed and is not just temporarily disconnected.
- Data structure consideration difference between a list of block numbers
that have changed, and a block bitmap. A block bitmap is perfect for
this, and only requires 1bit per block of storage, max. No more. A list
of block numbers can get *HUGE* though, because the block numbers are
probably all 64bit numbers, so it will be 64x the amount of space required
to store the list, not to mention the issues of sorting and maintaining it
- For determining the size of this 'block change map', you could use the
ceiling of the max number of blocks. so, a 100Gb storage mirror, would
have roughly 200000000 512b blocks. So, 200million bits (using a bitmap
to store when a block needs resyncing or not (0 no sync, 1 sync) is
roughly 24MB. You could pretty easily keep that in memory, but if the size
was 1Tb, you'd be at around 240MB, so that starts to get a little much.
Since this would be able to be enabled/disabled, it may not be an issue.
- Possibly, you could cheat. Instead of marking each storage block
(512byte sector) as needing sync or not, you could do it in 16KB chunks.
So, if any sector inside that 16KB chunk was written, resync the whole
chunk. That reduces your memory footprint for a 100GB mirror down to
something less than 1MB! That means a 1Tb mirror would need only 7-10MB.
You'll resync a little extra data, but since drives cache and the GEOM
layer does requests efficiently in larger sizes anyhow, this might
actually perform better anyway.
If there is anyone has further suggestions for this idea please let me
know and if there are and developers interested in this i may be able to
provide/donate some hardware - sorry not new - a laptop, desktop and some
hard drives - and can setup a machine for any network related testing.
More information about the freebsd-fs
mailing list