disk partitioning with gmirror + gpt + gjournal (RFC)
000.fbsd at quip.cz
Wed Oct 19 13:42:37 UTC 2011
Alfred Bartsch wrote:
> Am 18.10.2011 10:39, schrieb Miroslav Lachman:
>> Alfred Bartsch wrote:
>>> I am going to use the following paritioning scheme on our servers
>>> and programmers' workstations running FreeBSD 8 (system disk):
>>> physical drive - geom_mirror - geom_part_gpt - journaled UFS with
>>> separate boot and swap partitions. Partition names and sizes are
>>> taken from our environment - Your requirements may vary.
>> It is not good idead to use GPT on top of gmirror as was discussed
>> in the near past at freebsd-current at . You can read more in the
>> thread "RFC: Project geom-events" In short:
> I know this thread. But nobody there really mentions which utilities /
> BIOSes would fail or destroy the gmirror-metadata. The only
> complaining utility I know of is gptboot (only warning during boot).
> If You know other applications which will fail due to GPT problems,
> please tell me. Most of the problems shown in this thread seem to have
> something to do with the combined usage of gpt and glabel, which I'm
As is mentioned in the thread, the problem is with any GEOM class
storing is metadata at the end of the device (for example gmirror,
graid3, glabel and others)
> IMHO the only dangerous code is a foreign UEFI, which "repairs" the
> last sector of the GPT disk without further inquiry. None of our
> machines act in this way up to now.
> Once I will get one of those "unfriendly" machines I surely have to
> rethink my view of disk partitioning. I expect that this day either
> GEOM will be able to handle this situation or ZFS will be
UEFI will replace old BIOS sooner or later, so what you will do then?
Than you will need to rework your servers and change your setup routine.
And I think it is better to avolid known possible problem than hoping
"it will not bite me". You can't avoid Murphy's law ;)
>> I am using gjournal on few of our servers, but we are slowly
>> removing it from our setups. Data writes to gjournaled disks are
>> too slow and sometimes gjournal is not playing nice.
> I'm heavily interested in more details.
When I did some tests in the past, gjournal cannot be used in
combination with iSCSI and I was not able to stop gjournal tasting
providers (I was not able to remove / disable gjournal on device) until
I stop all of them and unload gjournal kernel module. I don't know the
>> Maybe ZFS or UFS+SUJ is better option.
> Yes, maybe. ZFS is mainly for future use. Do you use the second option
> on large filesystems?
ZFS is there for "a long time". I feel safe to use it in production on
few of our servers. I didn't test UFS+SUJ because it is released in
forthcoming 9.0 and we are not deploying current on our servers.
>>> create the (journaled) data partitions: root partition # gpart
>>> add -t freebsd-ufs -s 1G mirror/gm0 # gjournal label mirror/gm0p7
>>> mirror/gm0p3 note: IMHO journal size doesn't need to exceed data
>> I don't think gjournal is needed in such small partitions. Classic
>> fsck will be fast.
> You are right. But IMHO I can not mix journaled and not journaled R/W
> filesystems on a gmirror or I lose the main advantage of avoiding
> remirroring the whole disk after power failure or crash.
Yes, you are right, I forgot about this feature. I never used it this way.
>>> /etc/fstab could then look like # Device Mountpoint
>>> FStype Options Dump Pass# /dev/mirror/gm0p2 none
>>> swap sw 0 0 /dev/ufs/fbsdroot /
>>> ufs rw,noatime,async 1 1 /dev/ufs/fbsdhome /home
>>> ufs rw,noatime,async 2 2 /dev/ufs/fbsdusr /usr
>>> ufs rw,noatime,async 2 2 /dev/ufs/fbsdvar /var
>>> ufs rw,noatime,async 2 2
>> And there is one more problem which I am mentioning again and again
>> - the main problem of labels and gmirror is that "broken"
>> (dropped) provider (for example disk ad0) publishes its
>> partitioning and labels, so after reboot with degraded mirror, you
>> can start the system with /dev/ad0p7 mounted (because it also has
>> label "fbsdroot") instead of mirrored one. It depends on order of
>> tasting devices etc. and if something didn't change, it is
>> unpredictable to me, which device will be choosed if two devices
>> have the same label.
> Thanks for clarifying this. As I'm looking for a robust configuration,
> I will drop these labels. This leads to some minor changes in my
> # newfs -J mirror/gm0p7.journal
> # newfs -J mirror/gm0p8.journal
> # newfs -J mirror/gm0p9.journal
> # newfs -J mirror/gm0p10.journal
> /etc/fstab could then look like
> # Device Mountpoint FStype Options Dump Pass#
> /dev/mirror/gm0p2 none swap sw 0 0
> /dev/gm0p7.journal / ufs rw,noatime,async 1 1
> /dev/gm0p10.journal /home ufs rw,noatime,async 2 2
> /dev/gm0p9.journal /usr ufs rw,noatime,async 2 2
> /dev/gm0p8.journal /var ufs rw,noatime,async 2 2
>>> Some questions: Is this disk configuration valid and robust?
>>> (I've just started testing) Are there any other proposals -
>>> usable as "best known practice", I didn't find a complete setup
>>> so far?
>> We are using gmirror with good old mbr / fdisk / bsdlabel without
>> mounting by labels and with gjournal only on the big data
>> partitions. Not on root, var or partitions with databases (because
>> gjournal is slow on writes)
> with fdisk + bsdlabel there are not enough partitions in one slice to
> hold all the journals, and as I already mentioned I really want to
> minimize recovery time.
> With gmirror + gjournal I'm able to activate disk write cache without
> losing data consistency, which improves performance significantly.
According to following commit message, bsdlabel was extended to 26
partitions 3 years ago.
(I didn't tested yet, because I don't need it - we are using two slices
on our servers)
>> I see what you are trying to do and it would be nice if "all works
>> as one can expect", but the reality is different. So I don't think
>> it is good idea to make it as you described.
> I'm not yet fully convinced, that my idea of disk partitioning is a
> bad one, so please let me take part in your negative experiences with
> Thanks in advance.
I am not saying that your idea is bad. It just contains some things
which I rather avoid.
PS: please use Reply All, to post your reply to the mailing list as well
More information about the freebsd-geom