Gmirror - how to do?

Karl Denninger karl at denninger.net
Sat Feb 5 20:24:00 PST 2005


On Sat, Feb 05, 2005 at 08:58:15PM -0600, Karl Denninger wrote:
> I think you're missing what I'm trying to accomplish, and where (I think)
> it won't work.
> 
> Given the following configuration:
> 
> 1. Two internal SATA drives, each of size "X".
> 2. An external SATA drive bay, with two disk carriers each of size "X" (or
>    larger)
> 3. An initial RAID-1 boot volume that is comprised of the two internal disks.
> 4. FreeBSD 5-Stable
> 
> The goal is that:
> 
> 1. I can <leave a carrier in the external enclosure> and it will NOT be
>    used, by default, by the mirror, even if the system should reboot for
>    some odd reason.  This is important, because if I screw myself in some
>    fashion on the running machine, having the backup disk scribbled on
>    at the same time defeats the purpose of a backup.
> 
> 2. I can cause the carrier disk to be recognized, and if it is, the system
>    will bring it current.  A physical command to initiate this is
>    acceptable (indeed, required, since I do not want that process to start
>    unexpectedly.)
> 
> 3. Once the disk in the carrier IS current, I can detach it from the
>    running configuration and return to State #1 above.  I now have a
>    "near-line" backup that can be mounted (SEPARATELY!) at any time to 
>    recover data if for some reason there is a problem (e.g. I 'fat-finger'
>    an "rm -rf" from a directory that I really wanted)  I can also rotate
>    these disks in the carriers so that I have an offsite copy, plus a
>    near-line copy.
> 
> 4. If the original machine takes a dump entirely (e.g. the controller 
>    or OS goes insane and scribbles on both disks, there is a physical
>    catastrophe with the system, etc) I can take the disk in either the
>    carrier or the safe deposit box, plug it into an arbitrary machine
>    and boot it without drama.  I could also (if I had TWO disks) boot the
>    most current one and then bring the other into the mirror, effectively
>    recovering both the system and the redundancy all at once.  An FSCK in 
>    that instance is acceptable since I don't want to quiesce the system 
>    (which I understand is necessary to unmount/etc)  Critical applications 
>    can be stopped in (3) before the disk in the carrier is detached (e.g. 
>    dbms, etc) so I know THAT data is current.
> 
> The problem(s) I see with the possible ways to do this are:
> 
> 1. A "gmirror remove" removes the metadata on the disk and it will no
>    longer boot, since the root filesystem isn't on a mirror anymore (the
>    metadata is gone).
> 
> 2. A "fake out" with a gmirror label leaves me vulnerable to an unexpected
>    resync without being prompted if the system reboots for any reason, and
>    the possibility of the system activating the <WRONG> root volume.  The
>    latter could be catastrophically bad, especially if gmirror then rebuilt
>    the STALE disk back onto the other two, immediately destroying the
>    current data set!  It <ALSO> leaves me with the possibility that the
>    mirror created on that disk (if its one of the "larger" ones) is too big,
>    and if that one is booted in recovery mode that the other backup disk
>    could not be made part of the mirror due to it being physically smaller
>    in block count on the same slice.  In other words, whether the system is
>    truly recoverable transparently then depends on which disk is the one
>    you boot singly.  That's not good.
> 
> 3. A "gmirror deactivate" followed by a "gmirror forget" is likely (not
>    sure yet, will test) to leave me with an unbootable disk as in (1),
>    because while the metadata is still there it is marked "do not use";
>    thus, the boot will fail (I think)
> 
> 4. A "atacontrol detach" leaves me with the situation where a reboot
>    finds the disk and can leave me in exactly the same situation as (2).
> 
> Something I have not tried is to:
> 
> 1. Mark the mirror as "no automatic rebuilds."
> 2. Bring the third disk current.
> 3. Stop the critical processes, and perform a couple of sync's.
> 4. Forcibly detach it using "atacontrol detach 2".  This appears to 
>    also spin the disk down.
> 5. Clean up the idea that there is another disk out there with
>    "gmirror forget".  I should now have a "complete" mirror set with
>    two components, and a detached disk.  Since the mirror is set "no
>    automatic rebuilds", if the system reboots it should NOT reattach the
>    backup volume.
> 6. The backup volume SHOULD be bootable on its own, without drama, as the
>    mirror, while it will be seen as degraded, should still be operational
>    off that volume.
> 
> The only problem is that there is no way to insure any buffers for that
> disk have been flushed, so its very similar to a "plug pull" for an
> unprotected machine.
> 
> I think I'll try that one next, although I don't care for the semantics.  
> Worst thing I can get is a panic for the unexpected detach, I think :->
> 
> I can always use the disk as if it was a great big tape drive, and just
> dump each filesystem to it, which avoids all of this tomfoolery (maybe 
> with a minimal system on it so I can restore the dumps onto a new set of
> mirrors) but while that will work, it kinda evades the purpose of what I'm
> trying to do - come up with a way to "hot mirror" the machine in such a
> fashion that recovery is painless and possible even for someone who is
> completely untrained, being told only to stick the backup disk into a 
> carrier on a new CPU and turn on the power.
> 
> If this sort of thing is a model that the gmirror authors didn't think of,
> its something they might want to in the future.....

As expected, a "gmirror deactivate" leaves you with an unbootable
GEOM-Mirror disk, as when you attempt to boot from that disk the
system complains that the volume was marked inactive, and it is 
then skipped.  You can mount the 'raw' disk root partition (e.g.
/dev/ad4s1a), and I suspect, but do not know, if doing so and then
attempting to activate the mirror would succeed or cause really bad 
things to happen (since you already have the slice open with the 
root mount.)

A forcible "detach" on the SATA bus works (you can boot the resulting 
volume), but is extremely messy.  It also leaves you with a disk that when
reinserted requires a manual "insert" command.

However, it has one truly ugly potential in that if the disk is in the
carrier when the machine is booted, it would appear possible that any of
the providers - including the "stale" one you detached - could get picked
up as the 'master'.  That could have truly ugly consequences.

If this is all I've got available I guess I have to live with it, but 
forcibly ripping a bus out from under a disk seems pretty nasty to me.

I suspect that for safety reasons "deactivate" will have to be the way I
go, and live with the fact that a manual boot in the event of a recovery
will be necessary, along with lots of care.  I will run a few more tests
once I have a clean third disk again with the 'deactivate' option - I
suspect that I may be able to clean it up from the "fixit disk" with some
care manually and then boot from it.

Not ideal, but probably workable.

--
-- 
Karl Denninger (karl at denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net	My home on the net - links to everything I do!
http://scubaforum.org		Your UNCENSORED place to talk about DIVING!
http://www.spamcuda.net		SPAM FREE mailboxes - FREE FOR A LIMITED TIME!
http://genesis3.blogspot.com	Musings Of A Sentient Mind




More information about the freebsd-geom mailing list