disk partitioning with gmirror + gpt + gjournal (RFC)
Miroslav Lachman
000.fbsd at quip.cz
Wed Oct 19 13:42:37 UTC 2011
Alfred Bartsch wrote:
> Am 18.10.2011 10:39, schrieb Miroslav Lachman:
>> Alfred Bartsch wrote:
>>> I am going to use the following paritioning scheme on our servers
>>> and programmers' workstations running FreeBSD 8 (system disk):
>>> physical drive - geom_mirror - geom_part_gpt - journaled UFS with
>>> separate boot and swap partitions. Partition names and sizes are
>>> taken from our environment - Your requirements may vary.
>>
>> It is not good idead to use GPT on top of gmirror as was discussed
>> in the near past at freebsd-current at . You can read more in the
>> thread "RFC: Project geom-events" In short:
>> http://lists.freebsd.org/pipermail/freebsd-current/2011-October/028054.html
>>
>>
> http://lists.freebsd.org/pipermail/freebsd-current/2011-October/028109.html
>
> I know this thread. But nobody there really mentions which utilities /
> BIOSes would fail or destroy the gmirror-metadata. The only
> complaining utility I know of is gptboot (only warning during boot).
> If You know other applications which will fail due to GPT problems,
> please tell me. Most of the problems shown in this thread seem to have
> something to do with the combined usage of gpt and glabel, which I'm
> avoiding.
As is mentioned in the thread, the problem is with any GEOM class
storing is metadata at the end of the device (for example gmirror,
graid3, glabel and others)
> IMHO the only dangerous code is a foreign UEFI, which "repairs" the
> last sector of the GPT disk without further inquiry. None of our
> machines act in this way up to now.
> Once I will get one of those "unfriendly" machines I surely have to
> rethink my view of disk partitioning. I expect that this day either
> GEOM will be able to handle this situation or ZFS will be
> production-ready.
UEFI will replace old BIOS sooner or later, so what you will do then?
Than you will need to rework your servers and change your setup routine.
And I think it is better to avolid known possible problem than hoping
"it will not bite me". You can't avoid Murphy's law ;)
>> I am using gjournal on few of our servers, but we are slowly
>> removing it from our setups. Data writes to gjournaled disks are
>> too slow and sometimes gjournal is not playing nice.
>
> I'm heavily interested in more details.
When I did some tests in the past, gjournal cannot be used in
combination with iSCSI and I was not able to stop gjournal tasting
providers (I was not able to remove / disable gjournal on device) until
I stop all of them and unload gjournal kernel module. I don't know the
current state.
>> Maybe ZFS or UFS+SUJ is better option.
>
> Yes, maybe. ZFS is mainly for future use. Do you use the second option
> on large filesystems?
ZFS is there for "a long time". I feel safe to use it in production on
few of our servers. I didn't test UFS+SUJ because it is released in
forthcoming 9.0 and we are not deploying current on our servers.
>>> create the (journaled) data partitions: root partition # gpart
>>> add -t freebsd-ufs -s 1G mirror/gm0 # gjournal label mirror/gm0p7
>>> mirror/gm0p3 note: IMHO journal size doesn't need to exceed data
>>> size
>>
>> I don't think gjournal is needed in such small partitions. Classic
>> fsck will be fast.
>>
> You are right. But IMHO I can not mix journaled and not journaled R/W
> filesystems on a gmirror or I lose the main advantage of avoiding
> remirroring the whole disk after power failure or crash.
Yes, you are right, I forgot about this feature. I never used it this way.
>>> /etc/fstab could then look like # Device Mountpoint
>>> FStype Options Dump Pass# /dev/mirror/gm0p2 none
>>> swap sw 0 0 /dev/ufs/fbsdroot /
>>> ufs rw,noatime,async 1 1 /dev/ufs/fbsdhome /home
>>> ufs rw,noatime,async 2 2 /dev/ufs/fbsdusr /usr
>>> ufs rw,noatime,async 2 2 /dev/ufs/fbsdvar /var
>>> ufs rw,noatime,async 2 2
>>> =====================================================================
>>
>>>
>> And there is one more problem which I am mentioning again and again
>> - the main problem of labels and gmirror is that "broken"
>> (dropped) provider (for example disk ad0) publishes its
>> partitioning and labels, so after reboot with degraded mirror, you
>> can start the system with /dev/ad0p7 mounted (because it also has
>> label "fbsdroot") instead of mirrored one. It depends on order of
>> tasting devices etc. and if something didn't change, it is
>> unpredictable to me, which device will be choosed if two devices
>> have the same label.
>
> Thanks for clarifying this. As I'm looking for a robust configuration,
> I will drop these labels. This leads to some minor changes in my
> configuration:
>
> # newfs -J mirror/gm0p7.journal
> # newfs -J mirror/gm0p8.journal
> # newfs -J mirror/gm0p9.journal
> # newfs -J mirror/gm0p10.journal
>
> /etc/fstab could then look like
> # Device Mountpoint FStype Options Dump Pass#
> /dev/mirror/gm0p2 none swap sw 0 0
> /dev/gm0p7.journal / ufs rw,noatime,async 1 1
> /dev/gm0p10.journal /home ufs rw,noatime,async 2 2
> /dev/gm0p9.journal /usr ufs rw,noatime,async 2 2
> /dev/gm0p8.journal /var ufs rw,noatime,async 2 2
>
>>
>>> Some questions: Is this disk configuration valid and robust?
>>> (I've just started testing) Are there any other proposals -
>>> usable as "best known practice", I didn't find a complete setup
>>> so far?
>>
>> We are using gmirror with good old mbr / fdisk / bsdlabel without
>> mounting by labels and with gjournal only on the big data
>> partitions. Not on root, var or partitions with databases (because
>> gjournal is slow on writes)
>
> with fdisk + bsdlabel there are not enough partitions in one slice to
> hold all the journals, and as I already mentioned I really want to
> minimize recovery time.
> With gmirror + gjournal I'm able to activate disk write cache without
> losing data consistency, which improves performance significantly.
According to following commit message, bsdlabel was extended to 26
partitions 3 years ago.
http://lists.freebsd.org/pipermail/cvs-all/2007-December/239719.html
(I didn't tested yet, because I don't need it - we are using two slices
on our servers)
>> I see what you are trying to do and it would be nice if "all works
>> as one can expect", but the reality is different. So I don't think
>> it is good idea to make it as you described.
>>
> I'm not yet fully convinced, that my idea of disk partitioning is a
> bad one, so please let me take part in your negative experiences with
> gjournal.
> Thanks in advance.
I am not saying that your idea is bad. It just contains some things
which I rather avoid.
PS: please use Reply All, to post your reply to the mailing list as well
More information about the freebsd-geom
mailing list