[CFC/CFT] large changes in the loader(8) code

Pawel Jakub Dawidek pjd at FreeBSD.org
Wed Jun 27 18:22:45 UTC 2012


On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote:
> 
> On Jun 26, 2012, at 10:37 AM, John Baldwin wrote:
> > 
> > GPT really wants the backup header at the last LBA.  I know you can set it, 
> > but I've interpreted that as a way to see if the primary header is correct or 
> > not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
> > provider) will not work properly with partition editors for other OS's.  I'm 
> > hesitant to encourage the use of this as I do think putting GPT inside of a 
> > gmirror violates the GPT spec.
> 
> Agreed.

Guys. This doesn't violate the GPT spec in any way. The spec is
narrow-minded if it talks only about raw disks, but you should think
about gmirror as pseudo-hardware RAID. That's all. If putting GPT on top
of RAID array is spec violation, then I guess we just have to live with it.

> While it is a nice trick to use the last sector for meta data, it does
> create 2 problems. 1 is mentioned above. [...]

It doesn't really matter where gmirror puts its metadata. If gmirror
would keep its metadata in the first sector, gpart/gpt will find its
metadata in the last sector and will complain about missing primary
header.

> [...] The second is that when there's
> different metadata in the first *and* the last sector, you can't decide
> which is to take precedence without also looking at the other and know
> how to interpret it. We have not solved this second problem at all.  We
> do get reports about the problems though. At best we're handwaving or
> kluging.

This is different kind of problem. It took me a while to realize that,
but now I know:)

The real problem is that not all metadata formats are suitable for
autodetection. That's all.

The metadata I use in my GEOM classes play nice with autodetection.
The solution is very easy - keep size of the disk device within metadata.
This allows gmirror to figure out if it is configured on raw disk, last
slice or last partition within last slice, etc.
If GPT would keep disk size in its metadata the second problem you
mentioned would not exist. And to be honest GPT kinda does that by having
backup header's LBA stored in the primary header. And this is fine as
long the primary header is valid.

The same problem is with things like UFS labels. There is no way to
properly support them using GEOM autodetection, because there is no
provider size in UFS superblock. UFS superblock contains file system
size, but it is not the same, as one can create smaller file system than
the underlying disk device.

> I think it's unwise to depend on FreeBSD-specific extensions or features
> in industry-standard partitioning schemes and as such make the use of
> "foreign" tools hard if not impossible.

If you plan to use the given disk with FreeBSD only, what's the problem?
Partitioning is not the end of the world. Even if you use
"industry-standard partitioning schemes" what file system are you going
to use to actually access your data? FAT? Of course if you do share your
disk between various OSes then probably your best bet is to use MBR or
GPT on raw disk and FAT file system. But if you use your disk with
FreeBSD only, then I see no reason to not to leverage FreeBSD-specific
features (be it gmirror, geli or zfs).

> A much more flexible approach is to support out-of-band configuration
> data. This allows us to mirror GPT disks without having to become non-
> standard as it removes the need to use the last sector for meta-data.
> The ability to construct GEOM hierarchies unambiguously is very
> important and our current approach has proven to not deliver on that.
> This is actually impacting existing FreeBSD consumers already, like
> Juniper. So, se should not go deeper into this rabbit hole. We should
> finally solve this problem for real...

Marcel, nothing stops anyone from implementing GEOM mirror class that
uses no on-disk metadata. GEOM is not a limiting factor here. GEOM does
provide mechanism for autoconfiguration, but it is totally optional and
GEOM class might choose not to use it.

As an example you can take a look at two other GEOM classes of mine:
gconcat(8) and gstripe(8). You can use 'label' subcommand to store
metadata on component disks, which will take advantage of  GEOM
autodetection and autoconfiguration. You can also use 'create'
subcommand to create ad hoc provider that stores no metadata and makes
use of entire disks, which also means it won't be automatically created
on next boot.

For Juniper it might be more handy to use out-of-band configuration as
you know the hardware you are running on, so you know where the disks
are exactly, etc. My company build appliances too, so I have been there.
For most of our users automatic configuration is simply better, as they
can shuffle disks around and not wonder if the system will boot or not.

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20120627/b289b0fd/attachment.pgp


More information about the freebsd-hackers mailing list