zfs: allow to mount root from a pool not in zpool.cache

Pawel Jakub Dawidek pjd at FreeBSD.org
Fri Oct 5 06:38:20 UTC 2012


On Wed, Oct 03, 2012 at 05:51:29PM +0300, Andriy Gapon wrote:
> on 23/09/2012 07:59 Justin T. Gibbs said the following:
> > On Sep 22, 2012, at 10:28 AM, Andriy Gapon <avg at freebsd.org> wrote:
> > 
> >>
> >> Currently FreeBSD ZFS kernel code doesn't allow to mount root filesystem on a
> >> pool that is not listed in zpool.cache as only pools from the cache are known to
> >> ZFS at that time.
> > 
> > I've for some time been of the opinion that FreeBSD should only use
> > the cache file for ZFS pools created from non-GEOM objects (i.e.
> > files).  GEOM tasting should be used to make the kernel aware of
> > all pools whether they be imported on the system, partial, or
> > foreign.  Even for pools created by files, the user land utilities
> > should do nothing more than ask the kernel to "taste them".  This
> > would remove code duplicated in user land for this task (code that
> > must be re-executed in kernel space for validation reasons anyway)
> > and also help solve problems we've encountered at Spectra with races
> > in fault event processing, spare management, and device arrival and
> > departures.
> > 
> > So I'm excited by your work in this area and would encourage you
> > to "think larger" than just trying to integrate root pool discovery
> > with GEOM.  Spectra may even be able to help in this work sometime
> > in the near future.
> 
> For the moment I am trying to think "narrower" to fix the problem at hand :-)
> 
> But I see what you say.
> It doesn't make sense that
> - zfsboot tastes all BIOS visible disks for pools
> - zfsloader tastes all BIOS visible disks for pools [duplicated effort detected]
> - but kernel puts its all trust in some cache file
> 
> I am not sure what performance impact would tasting of all GEOM providers have,
> but I've got this idea.  geom_vdev geoms should taste all providers (like e.g.
> geom part or label do) and attach (but not g_access) to any that have valid zfs
> labels.  They should cache things like pool guids, vdev guids, txgs, etc.  So that
> that information is readily available for any queries.  So we easily know what
> pools we have in a system, what devices from those pools are available, etc.  When
> we want to import a pool we just start using the corresponding geom_vdev geoms
> (g_access them).
> 
> This will also remove a need for a disk tasting done from userland (which is weird
> on FreeBSD).
> 
> I think that the zfs+geom part is not too much work.  The userland reduction part
> looks scarier to me :-)

The original idea behind zpool.cache on Solaris was to reduce boot time
and not to taste every single disk/partition in the system if you have
few dozens of even few hundred drives in the system.

This argument doesn't apply to FreeBSD, as we do the tasting anyway in
GEOM. We could eventually try to make it parallel, but this was never
big issue for FreeBSD.

In my opinion requiring no zpool.cache to import root pool and mount
root file system is great idea and we should definiately do it. It will
heavly simplify ZFS configuration from various recovery media, etc.
User already makes him decision by either placing dataset name into
/etc/fstab or by defining vfs.root.mountfrom tunable. There is no need
to require anything else from him. He told us what he wants and we
should just do it - import the pool even if it is in exported state and
if it is not listed in zpool.cache. We already ignore hostid, because it
is not available during root mount. I'm all for it.

As for the other pools, I'm also in favour of autodetecting them.
It will be useful if root is read-only and /boot/zfs/ is read-only as
well. But here we need to be more careful. We should only import pool
that are in imported state and for which system's hostid matches.

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20121005/4f52bcaf/attachment.pgp


More information about the freebsd-fs mailing list