gjournal: journaled slices vs. journaled partitions

Gabriel Lavoie glavoie at gmail.com
Tue Nov 4 09:55:57 PST 2008


Hello,
     I built a similar setup last weekend on a new home server with two
500GB drives. I didn't want to only put gmirror and have full drives rebuild
on power failure/reset on the system. I was told that putting bsdlabels on a
gjournal provider wasn't a good idea but I have yet to have an answer about
why... I went with this setup anyway and I made some reset tests to see what
happens on reboot and everything always went fine.

When building this setup I got one big problem. If the root filesystem (/)
was on a gjournal provider, an unclean shutdown when data was being written
on the disk rendered the system completely unbootable. I got this message:

GEOM_MIRROR: Device mirror/gm launched (2/2)
GEOM_JOURNAL: Journal 3672855181: mirror/gma contains data.
GEOM_JOURNAL: Journal 3672855181: mirror/gma contains journal.
GEOM_JOURNAL: Journal 3868799910: mirror/gmd contains data.
GEOM_JOURNAL: Journal 3868799910: mirror/gmd contains journal.
GEOM_JOURNAL: Journal mirror/gmd consistent.
Trying to mount root from ufs:/dev/mirror/gm.journal

Manual root filesystem specification:
    <fstype>:<device>  Mount <device> using filesystem <fstype>
                                           eg. ufs:da0s1a
    ?                             List valid disk boot devices
    <empty line>          Abort manual input


mountroot> ?

List of GEOM managed disk devices:
     mirror/gmd.journal mirror/gmd mirror/gmc mirror/gma mirror/gm ad10s1c
ad10s1b ad8s1c ad8s1b ad10s2 ad10s1 ad8s1 ad10 ad8 acd0


As you can see, in the proposed list of disk devices devices to boot on,
"mirror/gm.journala" is absent. As I and Ivan Voras, that I contacted about
this problem, found, the GEOM_JOURNAL thread that is supposed to mark the
journal consistent takes too much time to do it with the root filesystem's
provider and the kernel try to mount a device that doesn't yet exist. A bug
report has been opened about this problem. For my final setup I decided to
put the root filesystem on a separate mirrorred slice of 1GB. Since this
slice isn't often written on, not many rebuilds should occur in case of
power failure. And I made my "power failure" test by hitting the reset
button while writing data on this filesystem and the rebuild on 1GB doesn't
takes too much time (at most 20-30 seconds).

Now I have the question. Why the "load" algorith wasn't recommended? Is it
fixed in 7.0-RELEASE-p5?

Here is my complete setup that seems to boot correctly every times I made my
reset tests while writing data on each filesystems. The 2GB gjournal
provider is directly on the mirror provider for all mirrored filesystems
exept the root one and I made my bsd labels on the gjournal provider,
instead of creating a journal for every filesystem.


[root at headless ~]# cat /etc/fstab
# Device                Mountpoint      FStype  Options         Dump
Pass#
/dev/ad10s1b            none            swap    sw              0       0
/dev/ad8s1b             none            swap    sw              0       0
/dev/mirror/root        /               ufs     rw              1       1
/dev/ufs/usr            /usr            ufs     rw,async        2       2
/dev/ufs/var            /var            ufs     rw,async        2       2
/dev/ufs/tmp            /tmp            ufs     rw,async        2       2
/dev/ufs/home           /home           ufs     rw,async        2       2
/dev/ufs/data           /mnt/data       ufs     rw,async        2       2
/dev/acd0               /cdrom          cd9660  ro,noauto       0       0


[root at headless ~]# mount
/dev/mirror/root on / (ufs, local, soft-updates)
devfs on /dev (devfs, local)
/dev/ufs/usr on /usr (ufs, asynchronous, local, gjournal)
/dev/ufs/var on /var (ufs, asynchronous, local, gjournal)
/dev/ufs/tmp on /tmp (ufs, asynchronous, local, gjournal)
/dev/ufs/home on /home (ufs, asynchronous, local, acls, gjournal)
/dev/ufs/data on /mnt/data (ufs, asynchronous, local, acls, gjournal)


[root at headless ~]# glabel status
    Name  Status  Components
 ufs/usr     N/A  mirror/data.journald
 ufs/var     N/A  mirror/data.journale
 ufs/tmp     N/A  mirror/data.journalf
ufs/home     N/A  mirror/data.journalg
ufs/data     N/A  mirror/data.journalh


[root at headless ~]# gjournal list
Geom name: gjournal 372943514
ID: 372943514
Providers:
1. Name: mirror/data.journal
   Mediasize: 495810966528 (462G)
   Sectorsize: 512
   Mode: r5w5e11
Consumers:
1. Name: mirror/data
   Mediasize: 497958450688 (464G)
   Sectorsize: 512
   Mode: r1w1e1
   Jend: 497958450176
   Jstart: 495810966528
   Role: Data,Journal


[root at headless ~]# gmirror list
Geom name: data
State: COMPLETE
Components: 2
Balance: split
Slice: 4096
Flags: NOFAILSYNC
GenID: 0
SyncID: 1
ID: 990032118
Providers:
1. Name: mirror/data
   Mediasize: 497958450688 (464G)
   Sectorsize: 512
   Mode: r1w1e1
Consumers:
1. Name: ad8s2
   Mediasize: 497958451200 (464G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: HARDCODED
   GenID: 0
   SyncID: 1
   ID: 235591066
2. Name: ad10s2
   Mediasize: 497958451200 (464G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: HARDCODED
   GenID: 0
   SyncID: 1
   ID: 2007880058

Geom name: root
State: COMPLETE
Components: 2
Balance: split
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 4098555256
Providers:
1. Name: mirror/root
   Mediasize: 1073022976 (1.0G)
   Sectorsize: 512
   Mode: r1w1e1
Consumers:
1. Name: ad8s1a
   Mediasize: 1073023488 (1.0G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: HARDCODED
   GenID: 0
   SyncID: 1
   ID: 3394521634
2. Name: ad10s1a
   Mediasize: 1073023488 (1.0G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: HARDCODED
   GenID: 0
   SyncID: 1
   ID: 3774466459


Gabriel


2008/11/4 Volodymyr Kostyrko <c.kworr at gmail.com>

> Carl wrote:
>
>> Volodymyr Kostyrko wrote:
>>
>>  I have some setups were gjournal was put on device rather the on
>>> partition, i.e.:
>>>
>>> [umgah] ~> gmirror status
>>>          Name    Status  Components
>>> mirror/umgah0  COMPLETE  ad0
>>>                          ad1
>>> [umgah] ~> gjournal status
>>>                  Name  Status  Components
>>> mirror/umgah0.journal     N/A  mirror/umgah0
>>> [umgah] ~> glabel status
>>>             Name  Status  Components
>>>   ufs/umgah0root     N/A  mirror/umgah0.journala
>>> label/umgah0swap     N/A  mirror/umgah0.journalb
>>>    ufs/umgah0usr     N/A  mirror/umgah0.journald
>>>    ufs/umgah0var     N/A  mirror/umgah0.journale
>>>
>>
>> Does the above suggest that you've ended up with individual journal
>> providers for each partition anyway? If so, where are they and have you
>> really achieved anything functionally different? Are they at the end of
>> their individually associated partitions or all together somewhere else? Has
>> the ill-advised journaled small partition issue been successfully overcome
>> through what you've done?
>>
>
> First, there is only one journal - for /dev/mirror/umgah0 and it is named
> /dev/mirror/umgah0.journal. Anything else is just a bsdlabel partitions,
> there are four of 'em.
>
>
>>  [umgah] ~> mount
>>> /dev/ufs/umgah0root on / (ufs, asynchronous, local, noatime, gjournal)
>>> devfs on /dev (devfs, local)
>>> /dev/md0 on /tmp (ufs, asynchronous, local)
>>> /dev/ufs/umgah0var on /var (ufs, asynchronous, local, noatime, gjournal)
>>> /dev/ufs/umgah0usr on /usr (ufs, asynchronous, local, noatime, gjournal)
>>> devfs on /var/named/dev (devfs, local)
>>>
>>> And yes, mirror autosynchronization is turned off, gjournal takes care of
>>> that too.
>>>
>>> It's not stated in manual, but gjournal is typically transparent for any
>>> type of access, just in case of UFS file system is marked as journaled so
>>> any metadata writes can be distinguished from data writes. Without that
>>> gjournal does literally nothing.
>>>
>>
>> And what does this mean for your swap partition?
>>
>
> Just nothing, it's just swap. It can't be journaled.
>
>  Laszlo Nagy wrote earlier:
>>
>>> Another tricky question: why would you journal a SWAP partition?
>>>
>>
>> Volodymyr, does your assertion that gjournal does nothing when a file
>> system is not UFS mean that there is no penalty with regard to your swap
>> partition despite the existence of "mirror/umgah0.journalb"?
>>
>
> I haven't seen any perfomance decrease in this configuration. And according
> to manual and articles about gjournal it should work this way.
>
>  Any chance you'd like to share your command sequence for constructing your
>> gmirror'd and gjournal'd filesystem, Volodymyr? :-)
>>
>
> If we have two disks (ad0, ad1) it should look like this:
>
> > gmirror label -b load -n umgah0 ad1
>
> We are getting all drive gmirrored without synchronization (we don't need
> it - journal would take care of any discrepancies) and with load balance
> (load was fixed not so long ago in stable and should be fine to go with).
>
> > gjournal label mirror/umgah0
>
> We are creating a journal on top of our gmirror. It eats 1G from the end of
> the disks and gives us the rest to use.
>
> > bsdlabel -wB mirror/umgah0.journal
>
> We are writing the standard bsdlabel to the disk and making it bootable.
> After that we will get one partition 'a'.
>
> <spam>
> Yes, no fdisk. I don't think this old piece of rough junk is ever needed on
> machine running FreeBSD solely. It just takes space, it requires
> compatibility to forgotten-and-abandoned standards and gives nothing more.
> You have your server dual-booting Windows or Linux? This is the only case
> you need fdisk for.
> </spam>
>
> > bsdlabel -e mirror/umgah0.journal
>
> Now we are splitting our journal to some partitions. I did it this way:
>
> # /dev/mirror/umgah0.journal:
> 8 partitions:
> #        size   offset    fstype   [fsize bsize bps/cpg]
>  a:   524288       16    4.2BSD
>  b: 16777216   *      swap
>  c: 779325614        0    unused        0     0         # "raw" part, don't
> edit
>  d: 33554432 *    4.2BSD
>  e: * *    4.2BSD
>
> After that we can format this filesystems:
>
> > newfs -J -L umgah0root /dev/mirror/umgah0.journala
> > newfs -J -L umgah0var /dev/mirror/umgah0.journald
> > newfs -J -L umgah0usr /dev/mirror/umgah0.journale
>
> And label the swap:
>
> > glabel label umgah0swap /dev/mirror/umgah0.journalb
>
> You can skip all this glabel thing, I just prefer to have slim fstab, as
> slim as possible.
>
> <fstab>
> /dev/label/umgah0swap none swap sw 0 0
>
> md /tmp mfs rw,-s1024m,-S,-oasync 0 0
>
> /dev/ufs/umgah0root / ufs rw,async,noatime 0 1
> /dev/ufs/umgah0var /var ufs rw,async,noatime 0 2
> /dev/ufs/umgah0usr /usr ufs rw,async,noatime 0 2
> </fstab>
>
> There's a lot more here to describe from moving system to newly created
> partitions to inserting and rebuilding our first disk to gmirror. All this
> issues are described in handbook or other articles found on the net.
>
>
> --
> Sphinx of black quartz judge my vow.
>
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "
> freebsd-questions-unsubscribe at freebsd.org"
>



-- 
Gabriel Lavoie
glavoie at gmail.com


More information about the freebsd-questions mailing list