New BSD Installer

Jeremy Chadwick freebsd at
Fri Feb 17 02:10:21 UTC 2012

On Thu, Feb 16, 2012 at 06:34:53PM -0700, Warren Block wrote:
> On Thu, 16 Feb 2012, Jeremy Chadwick wrote:
> >On Fri, Feb 17, 2012 at 01:08:28AM +0100, Miroslav Lachman wrote:
> >>
> >>Please don't mix two things together. gpart can replace fdisk and
> >>bsdlabel, but GPT vs. MBR is a different thing. GPT doesn't play
> >>nice with GEOM classes which store their metadata on last sector.
> >>For example, you can't use gmirror of a whole drives and use GPT on
> >>top of this mirror. (and gmirror is not the only one)
> >
> >This is quite possibly the most concise, clearest definition of a major
> >(borderline catastrophic) situation pertaining to GPT + GEOM
> >combinations.
> >
> >I'm going to be more bold than usual: who is fixing this, and when is it
> >going to be MFC'd to 9, 8, and probably 7 would be a good idea?  If
> >nobody is fixing this, someone had better light a fire under someone's
> >ass to fix it.  I'm absolutely amazed this is still a problem.
> How can it be fixed?  GPT only has two points of reference, the
> start and end of the disk.  To do more it would have to be aware of
> a lot of possible disk formats.

The GPT aspect of it cannot be fixed.  The GEOM aspect of it should be
fixed.  The "let's store the metadata in the last sector" mentality is
what needs to be addressed.  There has to be a better way of doing this.

I'm surprised that given the nature of these two bits (GPT vs. GEOM),
that the GEOM layer cannot simply lie about the full capacity of the
partition, or something to that effect.

Consider this: Linux's md driver has the capability to do, in effect,
the same thing GEOM classes (gmirror, etc.) do.  They obviously must
store metadata somewhere too.  How did they do it?

Quoting mdadm:

>> The devicesize option will rarely be of use. It applies to version 1.1
>> and 1.2 metadata only (where the metadata is at the start of the device)
>> and is only useful when the component device has changed size (typically
>> become larger). The version 1 metadata records the amount of the device
>> that can be used to store data, so if a device in a version 1.1 or 1.2
>> array becomes larger, the metadata will still be visible, but the extra
>> space will not. In this case it might be useful to assemble the array
>> with --update=devicesize. This will cause mdadm to determine the maximum
>> usable amount of space on each device and update the relevant field in
>> the metadata. 

Quoting md:

>> The common format -- known as version 0.90 -- has a superblock that is
>> 4K long and is written into a 64K aligned block that starts at least 64K
>> and less than 128K from the end of the device (i.e. to get the address
>> of the superblock round the size of the device down to a multiple of 64K
>> and then subtract 64K). The available size of each device is the amount
>> of space before the super block, so between 64K and 128K is lost when a
>> device in incorporated into an MD array. This superblock stores
>> multi-byte fields in a processor-dependent manner, so arrays cannot
>> easily be moved between computers with different processors.
>> The new format -- known as version 1 -- has a superblock that is
>> normally 1K long, but can be longer. It is normally stored between 8K
>> and 12K from the end of the device, on a 4K boundary, though variations
>> can be stored at the start of the device (version 1.1) or 4K from the
>> start of the device (version 1.2). This metadata format stores multibyte
>> data in a processor-independent format and supports up to hundreds of
>> component devices (version 0.90 only supports 28). 

So for version 0.90 of their metadata format, you lose drive capacity by
about 64-128KBytes, given that the space is needed for metadata.  For
version 1.0, I'm not sure.  For version 1.1 it looks like the metadata
can be stored at the beginning.

So overall, this sounds to me like the equivalent of if GEOM was to
"lie" about the actual capacities of the devices when using classes that
require use of metadata (gmirror, etc.).

> On the other hand, GEOM stuff works inside GPT partitions.  And if
> that's not acceptable, MBR partitions will be around for a long
> time.

MBR partitions don't scale past 2TB.  Arguing that use of MBR is an
acceptable workaround is the equivalent to burying one's head in the
sand.  Let's try to accept the future, not feign ignorance.

| Jeremy Chadwick                              jdc at |
| Parodius Networking            |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

More information about the freebsd-stable mailing list