good/best practices for gmirror and gjournal on a pair of disks?
Adam McDougall
mcdouga9 at egr.msu.edu
Tue May 13 20:36:25 UTC 2008
George Hartzell wrote:
> I've been running many of my systems for some time now using gmirror
> on a pair of identical disks, as described by Ralf at:
>
> http://people.freebsd.org/~rse/mirror/
>
> Each disk has single slice that covers almost all of the disk. These
> slices are combined into the gmirror device (gm0), which is then
> carved up by bsdlabel into gm0a (/), gm0b (swap), gm0d (/var), gm0e
> (/tmp), and gm0f (/usr).
>
> My latest machine is using Seagate 1TB disks so I thought I should add
> gjournal to the mix to avoid ugly fsck's if/when the machine doesn't
> shut down cleanly. I ended up just creating a gm0f.journal and using
> it for /usr, which basically seems to be working.
>
> I'm left with a couple of questions though:
>
> - I've read in the gjournal man page that when it is "... configured
> on top of gmirror(8) or graid3(8) providers, it also keeps them in
> a consistent state..." I've been trying to figure out if this
> simply falls out of how gjournal works or if there's explicity
> collusion with gmirror/graid3 but can't come up with a
> satisfactory explanation. Can someone walk me through it?
>
> Since I'm only gjournal'ing a portion of the underlying gmirror
> device I assume that I don't get this benefit?
>
> - I've also read in the gjournal man page "... that sync(2) and
> fsync(2) system calls do not work as expected anymore." Does this
> invalidate any of the assumptions made by various database
> packages such as postgresql, sqlite, berkeley db, etc.... about
> if/when/whether their data is safely on the disk?
>
> - What's the cleanest gjournal adaptation of rse's
> two-disk-mirror-everything setup that would be able to avoid
> tedious gmirror sync's. The best I've come up with is to do two
> slices per disk, combine the slices into a pair of gmirror
> devices, bsdlabel the first into gm0a (/), gm0b (swap), gm0d
> (/var) and gm0e (/tmp) and bsdlabel the second into a gm1f which
> gets a gjournal device.
>
> Alternatively, would it work and/or make sense to give each disk a
> single slice, combine them into a gmirror, put a gjournal on top
> of that, then use bsdlabel to slice it up into partitions?
>
> Is anyone using gjournal and gmirror for all of the system on a pair
> of disks in some other configuration?
>
> Thanks,
>
> g.
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>
>
I am pasting below the instructions I would use to convert a recently
installed system with only / (root) and swap to be using
gmirror+gjournal. It is in mediawiki markup format so it could be
pasted into one if desired. I based my gmirror steps on the
instructions from http://people.freebsd.org/~rse/mirror/ so thats why
some of the words sound familiar. I also have similar instructions for
setting up a gmirrored da0s1a and da0s1b alongside a zfs mirror
containing the rest.
I decided to journal /usr /var /tmp and leave / as a standard UFS
partition because it is so small, fsck doesn't take long anyway and
hopefully doesn't get written to enough to cause damage by an abrupt
reboot. Because I'm not journaling the root partition, I chose to
ignore the possibility of gjournal marking the mirror clean. Sudden
reboots don't happen enough on servers for me to care. And all my
servers got abruptly rebooted this sunday and they all came up fine :)
I believe gjournal uses 1G for journal (2x512) which seemed to be
sufficient on all of the systems where I have used the default, but I
quickly found that using a smaller journal is a bad idea and leads to
panics that I was unable to avoid with tuning. Considering 1G was such
a close value, I chose to go several times above the default journal
size (disk is cheap and I want to be sure) but I ran into problems using
gjournal label -s (size) rejecting my sizes or wrapping the value around
to something too low. As a workaround I chose to use a separate
partition for each journal. I quickly ran out of partitions in a bsd
disklabel so I decided to partition each disk into two slices; the first
for data and the second for journals. This also made it easier to line
up disk devices so they made more sense as a pair, for example:
gm0s1d(data) + gm0s2d(journal) = /usr.
I will note that if you accidentally put a gjournal label in the 'wrong'
spot on your disk, you might make a tough situation for yourself getting
rid of it. I have had plenty of times where I applied a gjournal label,
discovered something unideal with it, but every time I did 'gjournal
stop foo' the label would automatically get detected as a child of a
different part of the disk because it could be seen and I could not
unload it. That is part of why I use -h for gjournal label, and use
slices+partitions, and the first partition is at offset 16, some of
which may have been for gmirror's sake too.
==Software raid on 72G disks with gjournal==
5 min to setup, around 30 min to sync
===Prepare===
*Clear any old mirror config including old gmirror labels
sysctl kern.geom.debugflags=16
gmirror clear da0
gmirror clear da1
sysctl kern.geom.debugflags=0
dd if=/dev/zero of=/dev/da1 bs=512 count=79
*place a GEOM mirror label onto second disk
gmirror label -v -n -b round-robin gm0 /dev/da1
*activate GEOM mirror kernel layer
gmirror load
===Partition===
*place a PC MBR onto the second disk to make it bootable. Also
partition it with the majority of space as partition 1, and enough for
your journal partitions as partition 2.
'''You might get an error, such as "fdisk: Geom not found". If the next
steps work, ignore the error.'''
fdisk -v -B -I /dev/mirror/gm0
*Partition it into two slices. I think there is an easier way but I
cannot remember how. Maybe I used a different method of using fdisk and
ignored the end cyl values since they dont seem to make much sense
anyway. sysinstall or sade could be used as an alternative.
fdisk -i /dev/mirror/gm0
Do you want to change our idea of what BIOS thinks ? '''[n]'''
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 143363997 (70001 Meg), flag 80 (active)
'' ^^^^^^^^^
A = 143363997''
beg: cyl 0/ head 1/ sector 1;
end: cyl 731/ head 254/ sector 63
Do you want to change it? [n] '''y'''<br>
''We want to make partitions approx 60G(data) and 10G(journals).''
''So take variable A, divide by 7 and multiply by 6 to get var B.''
''B = 122883426''<br>
Supply a decimal value for "sysid (165=FreeBSD)" '''[165]'''
Supply a decimal value for "start" '''[63]'''
Supply a decimal value for "size" [143363997] '''122883426'''
''^^^^^^^^^''
''put B here''
fdisk: WARNING: partition does not end on a cylinder boundary
fdisk: WARNING: this may confuse the BIOS or some operating systems
Correct this automatically? [n] '''y'''
fdisk: WARNING: adjusting size of partition to 122881122
Explicitly specify beg/end address ? '''[n]'''
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 122881122 (60000 Meg), flag 80 (active)
''^^^^^^^^^''
''C = 122881122''
''D = C + 63 = 122881122 + 63 = 122881185''
''E = A - C = 143363997 - 122881185 = 20482812''<br>
beg: cyl 0/ head 1/ sector 1;
end: cyl 480/ head 254/ sector 63
Are we happy with this entry? [n] '''y'''
The data for partition 2 is:
<UNUSED>
Do you want to change it? [n] '''y'''
Supply a decimal value for "sysid (165=FreeBSD)" [0] '''165'''
Supply a decimal value for "start" [0] '''122881185'''
''^^^^^^^^^''
''put D here ''
Supply a decimal value for "size" [0] '''20482812'''
''^^^^^^^^''
''put E here''
Explicitly specify beg/end address ? '''[n]'''
Are we happy with this entry? [n] '''y'''
The data for partition 3 is:
<UNUSED>
Do you want to change it? '''[n]'''
The data for partition 4 is:
<UNUSED>
Do you want to change it? '''[n]'''
Partition 1 is marked active
Do you want to change the active partition? '''[n]'''
Should we write new partition table? [n] '''y'''
'''You might get an error, such as "fdisk: Geom not found". If the next
steps work, ignore the error.'''
===Disklabel===
*place a BSD disklabel onto the mirrors
bsdlabel -w -B /dev/mirror/gm0s1
bsdlabel -w /dev/mirror/gm0s2
NOTICE: figure out what partitions you want by referring to bsdlabel
/dev/da0s1 and/or running bsdlabel /dev/mirror/gm0s1 on a different
server that has already been mirrored and partition to your liking.
Size can be specified with ##M, ##G or * for remainder, and offset
should be * to make it calculate it. Paste the output into the editor
and make whatever changes you want as long as it includes: start "a"
partition at offset 16, "c" partition at offset 0)
*Partition 1:
bsdlabel -e /dev/mirror/gm0s1
Example:
# size offset fstype [fsize bsize bps/cpg]
a: 1G 16 4.2BSD
b: 4G * swap
c: * 0 unused # "raw" part, don't edit
d: 10G * 4.2BSD
e: * * 4.2BSD
f: 4G * 4.2BSD
*Partition 2:
bsdlabel -e /dev/mirror/gm0s2
Example:
# size offset fstype [fsize bsize bps/cpg]
c: * 0 unused # "raw" part, don't edit
d: 4G 16 4.2BSD
e: 4G * 4.2BSD
f: * * 4.2BSD
===Gjournal label===
*Label the data and journals so the journaled partition is available.
gjournal label -f -h mirror/gm0s1d mirror/gm0s2d
gjournal label -f -h mirror/gm0s1e mirror/gm0s2e
gjournal label -f -h mirror/gm0s1f mirror/gm0s2f
*Load the kernel module so the journaled partitions are detected:
gjournal load
===Newfs===
*Format the devices with journaling support in UFS:
newfs /dev/mirror/gm0s1a
newfs -J /dev/mirror/gm0s1d.journal
newfs -J /dev/mirror/gm0s1e.journal
newfs -J /dev/mirror/gm0s1f.journal
===Mount===
*Mount them temporarily:
mount /dev/mirror/gm0s1a /mnt
mkdir -p /mnt/usr /mnt/var /mnt/tmp
mount -o async /dev/mirror/gm0s1d.journal /mnt/usr
mount -o async /dev/mirror/gm0s1e.journal /mnt/var
mount -o async /dev/mirror/gm0s1f.journal /mnt/tmp
===Copy Data===
*Install rsync, if not already:
pkg_add -r rsync
*Copy the original boot drive to the new device:
rehash
rsync -avHSx --progress / /mnt/
(This will take about 1 minute.)
===Prepare mirror for booting===
*Edit '''/mnt/etc/fstab''' replacing the following mountpoints:
vi /mnt/etc/fstab
Old:
# Device Mountpoint FStype Options Dump
Pass#
/dev/da0s1b none swap sw 0 0
/dev/da0s1a / ufs rw 1 1
/dev/cd0 /cdrom cd9660 ro,noauto 0 0
/dev/acd0 /cdrom1 cd9660 ro,noauto 0 0
New:
# Device Mountpoint FStype Options Dump
Pass#
/dev/mirror/gm0s1b none swap sw
0 0
/dev/mirror/gm0s1a / ufs rw
1 1
/dev/mirror/gm0s1d.journal /usr ufs rw,async
2 2
/dev/mirror/gm0s1e.journal /var ufs rw,async
2 2
/dev/mirror/gm0s1f.journal /tmp ufs rw,async
2 2
/dev/cd0 /cdrom cd9660 ro,noauto
0 0
/dev/acd0 /cdrom1 cd9660 ro,noauto
0 0
*Load necessary kernel modules at boot:
echo 'geom_journal_load="YES"' >> /mnt/boot/loader.conf
echo 'geom_mirror_load="YES"' >> /mnt/boot/loader.conf
*instruct boot stage 2 loader on first disk to boot with the boot stage
3 loader from the second disk (mainly because BIOS might not allow easy
booting from second ATA disk or at least requires manual intervention on
the console)
echo "1:da(1,a)/boot/loader" >/boot.config
*We're done with the first stage, reboot:
reboot
===Check results===
*Login and run df. Should look like this:
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/mirror/gm0s1a 1012974 201898 730040 22% /
devfs 1 1 0 100% /dev
/dev/mirror/gm0s1d.journal 10154156 144920 9196904 2% /usr
/dev/mirror/gm0s1e.journal 40209204 322 36992146 0% /var
/dev/mirror/gm0s1f.journal 4058060 12 3733404 0% /tmp
===Configure second disk into mirror===
*Add the original boot disk to the mirror. Make sure the first disk is
treated as a really fresh one
dd if=/dev/zero of=/dev/da0 bs=512 count=79
*switch GEOM mirror to auto-synchronization and add first disk (first
disk is now immediately synchronized with the second disk content)
gmirror configure -a gm0
gmirror insert gm0 /dev/da0
*Wait for the GEOM mirror synchronization to complete, or check it
manually with ''gmirror list''
sh -c 'while [ ".`gmirror list | grep SYNCHRONIZING`" != . ]; do sleep
1; done'
*Reboot into the final two-disk GEOM mirror setup (now actually boots
with the MBR and boot stages on first disk as it was synchronized from
second disk)
reboot
===Mirror check script===
*Enable daily_status_gmirror_enable in /etc/periodic.conf or write your
own script to monitor gmirror status
More information about the freebsd-stable
mailing list