ZFS root mount regression

Mon Jul 22 00:43:05 UTC 2019

Hi.

Maybe different problem (as mav@ noted) with Garrett's but related to
parallel-mounting.

 *For Garrett's problem, +1 with Trond. For myself, I incorporate
  drive type and No. in pool name to avoid collision between 2
  physical drives (one working, and one emergency) in the same host.

After ZFS parallel mounting is committed (both head and stable/12),
auto-mounting from manually-imported non-root pool(s) looks racy and
usually fails (some datasets are shown as mounted, but not accessible
until manual unmount/remount is proceeded).

 *I'm experiencing the problem when I import another root pool
  by `zpool import -R /mnt -f poolname`.

Patch from ZoL on bug 237517 [1] seems to fix the parallel mounting
race. (Named ZoL fix by fullermd.)
As it seemed to be race condition, I'm not 100% shure the patch
is really correct.

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237517

On Sun, 21 Jul 2019 17:41:59 -0400
Alexander Motin <mav at FreeBSD.org> wrote:

> Hi,
> 
> I am not sure how the original description leads to conclusion that
> problem is related to parallel mounting.  From my point of view it
> sounds like a problem that root pool mounting happens based on name, not
> pool GUID that needs to be passed from the loader.  We have seen problem
> like that ourselves too when boot pool names collide.  So I doubt it is
> a new problem, just nobody got to fixing it yet.
> 
> On 20.07.2019 06:41, Eugene Grosbein wrote:
> > CC'ing Alexander Motin who comitted the change.
> > 
> > 20.07.2019 1:21, Garrett Wollman wrote:
> > 
> >> I recently upgraded several file servers from 11.2 to 11.3.  All of
> >> them boot from a ZFS pool called "tank" (the data is in a different
> >> pool).  In a couple of instances (which caused me to have to take a
> >> late-evening 140-mile drive to the remote data center where they are
> >> located), the servers crashed at the root mount phase.  In one case,
> >> it bailed out with error 5 (I believe that's [EIO]) to the usual
> >> mountroot prompt.  In the second case, the kernel panicked instead.
> >>
> >> The root cause (no pun intended) on both servers was a disk which was
> >> supplied by the vendor with a label on it that claimed to be part of
> >> the "tank" pool, and for some reason the 11.3 kernel was trying to
> >> mount that (faulted) pool rather than the real one.  The disks and
> >> pool configuration were unchanged from 11.2 (and probably 11.1 as
> >> well) so I am puzzled.
> >>
> >> Other than laboriously running "zpool labelclear -f /dev/somedisk" for
> >> every piece of media that comes into my hands, is there anything else
> >> I could have done to avoid this?
> > 
> > Both 11.3-RELEASE announcement and Release Notes mention this:
> > 
> >> The ZFS filesystem has been updated to implement parallel mounting.
> > 
> > I strongly suggest reading Release documentation in case of troubles
> > after upgrade, at least. Or better, read *before* updating.
> > 
> > I guess this parallelism created some race for your case.
> > 
> > Unfortunately, a way to fall back to sequential mounting seems undocumented.
> > libzfs checks for ZFS_SERIAL_MOUNT environment variable to exist having any value.
> > I'm not sure how you set it for mounting root, maybe it will use kenv,
> > so try adding to /boot/loader.conf:
> > 
> > ZFS_SERIAL_MOUNT=1
> > 
> > Alexander should have more knowledge on this.
> > 
> > And of course, attaching unrelated device having label conflicting
> > with root pool is asking for trouble. Re-label it ASAP.
> > 
> 
> -- 
> Alexander Motin
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"

-- 
Tomoaki AOKI    <junchoon at dec.sakura.ne.jp>