gjournal: journaled slices vs. journaled partitions
Gabriel Lavoie
glavoie at gmail.com
Tue Nov 4 09:55:57 PST 2008
Hello,
I built a similar setup last weekend on a new home server with two
500GB drives. I didn't want to only put gmirror and have full drives rebuild
on power failure/reset on the system. I was told that putting bsdlabels on a
gjournal provider wasn't a good idea but I have yet to have an answer about
why... I went with this setup anyway and I made some reset tests to see what
happens on reboot and everything always went fine.
When building this setup I got one big problem. If the root filesystem (/)
was on a gjournal provider, an unclean shutdown when data was being written
on the disk rendered the system completely unbootable. I got this message:
GEOM_MIRROR: Device mirror/gm launched (2/2)
GEOM_JOURNAL: Journal 3672855181: mirror/gma contains data.
GEOM_JOURNAL: Journal 3672855181: mirror/gma contains journal.
GEOM_JOURNAL: Journal 3868799910: mirror/gmd contains data.
GEOM_JOURNAL: Journal 3868799910: mirror/gmd contains journal.
GEOM_JOURNAL: Journal mirror/gmd consistent.
Trying to mount root from ufs:/dev/mirror/gm.journal
Manual root filesystem specification:
<fstype>:<device> Mount <device> using filesystem <fstype>
eg. ufs:da0s1a
? List valid disk boot devices
<empty line> Abort manual input
mountroot> ?
List of GEOM managed disk devices:
mirror/gmd.journal mirror/gmd mirror/gmc mirror/gma mirror/gm ad10s1c
ad10s1b ad8s1c ad8s1b ad10s2 ad10s1 ad8s1 ad10 ad8 acd0
As you can see, in the proposed list of disk devices devices to boot on,
"mirror/gm.journala" is absent. As I and Ivan Voras, that I contacted about
this problem, found, the GEOM_JOURNAL thread that is supposed to mark the
journal consistent takes too much time to do it with the root filesystem's
provider and the kernel try to mount a device that doesn't yet exist. A bug
report has been opened about this problem. For my final setup I decided to
put the root filesystem on a separate mirrorred slice of 1GB. Since this
slice isn't often written on, not many rebuilds should occur in case of
power failure. And I made my "power failure" test by hitting the reset
button while writing data on this filesystem and the rebuild on 1GB doesn't
takes too much time (at most 20-30 seconds).
Now I have the question. Why the "load" algorith wasn't recommended? Is it
fixed in 7.0-RELEASE-p5?
Here is my complete setup that seems to boot correctly every times I made my
reset tests while writing data on each filesystems. The 2GB gjournal
provider is directly on the mirror provider for all mirrored filesystems
exept the root one and I made my bsd labels on the gjournal provider,
instead of creating a journal for every filesystem.
[root at headless ~]# cat /etc/fstab
# Device Mountpoint FStype Options Dump
Pass#
/dev/ad10s1b none swap sw 0 0
/dev/ad8s1b none swap sw 0 0
/dev/mirror/root / ufs rw 1 1
/dev/ufs/usr /usr ufs rw,async 2 2
/dev/ufs/var /var ufs rw,async 2 2
/dev/ufs/tmp /tmp ufs rw,async 2 2
/dev/ufs/home /home ufs rw,async 2 2
/dev/ufs/data /mnt/data ufs rw,async 2 2
/dev/acd0 /cdrom cd9660 ro,noauto 0 0
[root at headless ~]# mount
/dev/mirror/root on / (ufs, local, soft-updates)
devfs on /dev (devfs, local)
/dev/ufs/usr on /usr (ufs, asynchronous, local, gjournal)
/dev/ufs/var on /var (ufs, asynchronous, local, gjournal)
/dev/ufs/tmp on /tmp (ufs, asynchronous, local, gjournal)
/dev/ufs/home on /home (ufs, asynchronous, local, acls, gjournal)
/dev/ufs/data on /mnt/data (ufs, asynchronous, local, acls, gjournal)
[root at headless ~]# glabel status
Name Status Components
ufs/usr N/A mirror/data.journald
ufs/var N/A mirror/data.journale
ufs/tmp N/A mirror/data.journalf
ufs/home N/A mirror/data.journalg
ufs/data N/A mirror/data.journalh
[root at headless ~]# gjournal list
Geom name: gjournal 372943514
ID: 372943514
Providers:
1. Name: mirror/data.journal
Mediasize: 495810966528 (462G)
Sectorsize: 512
Mode: r5w5e11
Consumers:
1. Name: mirror/data
Mediasize: 497958450688 (464G)
Sectorsize: 512
Mode: r1w1e1
Jend: 497958450176
Jstart: 495810966528
Role: Data,Journal
[root at headless ~]# gmirror list
Geom name: data
State: COMPLETE
Components: 2
Balance: split
Slice: 4096
Flags: NOFAILSYNC
GenID: 0
SyncID: 1
ID: 990032118
Providers:
1. Name: mirror/data
Mediasize: 497958450688 (464G)
Sectorsize: 512
Mode: r1w1e1
Consumers:
1. Name: ad8s2
Mediasize: 497958451200 (464G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: HARDCODED
GenID: 0
SyncID: 1
ID: 235591066
2. Name: ad10s2
Mediasize: 497958451200 (464G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: HARDCODED
GenID: 0
SyncID: 1
ID: 2007880058
Geom name: root
State: COMPLETE
Components: 2
Balance: split
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 4098555256
Providers:
1. Name: mirror/root
Mediasize: 1073022976 (1.0G)
Sectorsize: 512
Mode: r1w1e1
Consumers:
1. Name: ad8s1a
Mediasize: 1073023488 (1.0G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: HARDCODED
GenID: 0
SyncID: 1
ID: 3394521634
2. Name: ad10s1a
Mediasize: 1073023488 (1.0G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: HARDCODED
GenID: 0
SyncID: 1
ID: 3774466459
Gabriel
2008/11/4 Volodymyr Kostyrko <c.kworr at gmail.com>
> Carl wrote:
>
>> Volodymyr Kostyrko wrote:
>>
>> I have some setups were gjournal was put on device rather the on
>>> partition, i.e.:
>>>
>>> [umgah] ~> gmirror status
>>> Name Status Components
>>> mirror/umgah0 COMPLETE ad0
>>> ad1
>>> [umgah] ~> gjournal status
>>> Name Status Components
>>> mirror/umgah0.journal N/A mirror/umgah0
>>> [umgah] ~> glabel status
>>> Name Status Components
>>> ufs/umgah0root N/A mirror/umgah0.journala
>>> label/umgah0swap N/A mirror/umgah0.journalb
>>> ufs/umgah0usr N/A mirror/umgah0.journald
>>> ufs/umgah0var N/A mirror/umgah0.journale
>>>
>>
>> Does the above suggest that you've ended up with individual journal
>> providers for each partition anyway? If so, where are they and have you
>> really achieved anything functionally different? Are they at the end of
>> their individually associated partitions or all together somewhere else? Has
>> the ill-advised journaled small partition issue been successfully overcome
>> through what you've done?
>>
>
> First, there is only one journal - for /dev/mirror/umgah0 and it is named
> /dev/mirror/umgah0.journal. Anything else is just a bsdlabel partitions,
> there are four of 'em.
>
>
>> [umgah] ~> mount
>>> /dev/ufs/umgah0root on / (ufs, asynchronous, local, noatime, gjournal)
>>> devfs on /dev (devfs, local)
>>> /dev/md0 on /tmp (ufs, asynchronous, local)
>>> /dev/ufs/umgah0var on /var (ufs, asynchronous, local, noatime, gjournal)
>>> /dev/ufs/umgah0usr on /usr (ufs, asynchronous, local, noatime, gjournal)
>>> devfs on /var/named/dev (devfs, local)
>>>
>>> And yes, mirror autosynchronization is turned off, gjournal takes care of
>>> that too.
>>>
>>> It's not stated in manual, but gjournal is typically transparent for any
>>> type of access, just in case of UFS file system is marked as journaled so
>>> any metadata writes can be distinguished from data writes. Without that
>>> gjournal does literally nothing.
>>>
>>
>> And what does this mean for your swap partition?
>>
>
> Just nothing, it's just swap. It can't be journaled.
>
> Laszlo Nagy wrote earlier:
>>
>>> Another tricky question: why would you journal a SWAP partition?
>>>
>>
>> Volodymyr, does your assertion that gjournal does nothing when a file
>> system is not UFS mean that there is no penalty with regard to your swap
>> partition despite the existence of "mirror/umgah0.journalb"?
>>
>
> I haven't seen any perfomance decrease in this configuration. And according
> to manual and articles about gjournal it should work this way.
>
> Any chance you'd like to share your command sequence for constructing your
>> gmirror'd and gjournal'd filesystem, Volodymyr? :-)
>>
>
> If we have two disks (ad0, ad1) it should look like this:
>
> > gmirror label -b load -n umgah0 ad1
>
> We are getting all drive gmirrored without synchronization (we don't need
> it - journal would take care of any discrepancies) and with load balance
> (load was fixed not so long ago in stable and should be fine to go with).
>
> > gjournal label mirror/umgah0
>
> We are creating a journal on top of our gmirror. It eats 1G from the end of
> the disks and gives us the rest to use.
>
> > bsdlabel -wB mirror/umgah0.journal
>
> We are writing the standard bsdlabel to the disk and making it bootable.
> After that we will get one partition 'a'.
>
> <spam>
> Yes, no fdisk. I don't think this old piece of rough junk is ever needed on
> machine running FreeBSD solely. It just takes space, it requires
> compatibility to forgotten-and-abandoned standards and gives nothing more.
> You have your server dual-booting Windows or Linux? This is the only case
> you need fdisk for.
> </spam>
>
> > bsdlabel -e mirror/umgah0.journal
>
> Now we are splitting our journal to some partitions. I did it this way:
>
> # /dev/mirror/umgah0.journal:
> 8 partitions:
> # size offset fstype [fsize bsize bps/cpg]
> a: 524288 16 4.2BSD
> b: 16777216 * swap
> c: 779325614 0 unused 0 0 # "raw" part, don't
> edit
> d: 33554432 * 4.2BSD
> e: * * 4.2BSD
>
> After that we can format this filesystems:
>
> > newfs -J -L umgah0root /dev/mirror/umgah0.journala
> > newfs -J -L umgah0var /dev/mirror/umgah0.journald
> > newfs -J -L umgah0usr /dev/mirror/umgah0.journale
>
> And label the swap:
>
> > glabel label umgah0swap /dev/mirror/umgah0.journalb
>
> You can skip all this glabel thing, I just prefer to have slim fstab, as
> slim as possible.
>
> <fstab>
> /dev/label/umgah0swap none swap sw 0 0
>
> md /tmp mfs rw,-s1024m,-S,-oasync 0 0
>
> /dev/ufs/umgah0root / ufs rw,async,noatime 0 1
> /dev/ufs/umgah0var /var ufs rw,async,noatime 0 2
> /dev/ufs/umgah0usr /usr ufs rw,async,noatime 0 2
> </fstab>
>
> There's a lot more here to describe from moving system to newly created
> partitions to inserting and rebuilding our first disk to gmirror. All this
> issues are described in handbook or other articles found on the net.
>
>
> --
> Sphinx of black quartz judge my vow.
>
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "
> freebsd-questions-unsubscribe at freebsd.org"
>
--
Gabriel Lavoie
glavoie at gmail.com
More information about the freebsd-questions
mailing list