Root volume renumbered unexpectedly, no longer boots

Matthew Pounsett matt at conundrum.com
Fri Dec 13 20:57:39 UTC 2019


We have a large-ish FreeBSD 11.2-p7  file server with two 24-disk ZFS pools
(20 live 4 spares each) and a single SSD boot volume.  Yesterday we pulled
some dead drives from the ZFS pools and replaced them with new drives
intended to become spares.  After powering the system back up, it looks
like the boot volume has been renumbered from da0 to da4.

I thought renumbering like this wasn't supposed to happen for at least the
last decade, since ATA_STATIC_ID was introduced to the kernel, but there's
little doubt that's what's happened.

Automatic boot now fails and drops to the third stage loader prompt when
the kernel tries to mount the root volume from ufs:/dev/da0p2.  I can
manually try to mount the root volume as ufs:/dev/da4p2, and the system
begins to load the root volume, but then hangs.  The only two lines printed
after loading da4 are related to loading up the ZFS pools.  I can't
reproduce the messages again now (explained below), so quoting them
verbatim isn't possible, but they're related to the ZFS version being
behind and suggesting I upgrade the pools.  The messages themselves are not
unusual,and I'm used to seeing similar messages in the 'zpool status'
output for a while now.  What is unusual is that the system seems to hang
at this point.  I'm concerned that the re-ordering of drives might be
causing problems for the system trying to put the ZFS pools back together.
I don't really know, though.  Does anyone have any insight into what's
going on here?

There is a new wrinkle... since booting from a USB stick so that I could
get into the box and double-check some things, and confirm the location of
the root volume, the BIOS no longer seems to see da4 as a potential boot
volume.  I'm hoping that goes back to the way it was once the USB stick is
removed.  At the moment I have no way to even get the box to try/fail to
boot from its normal boot volume.  The machine is many thousands of miles
remote, so I haven't tried to do this yet... I can invoke some remote help
once that's necessary.

BIOS issue aside, I'm hoping there's a way I can pin this drive back to
da0.  I don't know how that could be done, but if anyone has any
suggestions I'd happily try them.  Failing that, I suppose I can just
insert a vfs.root.mountfrom option in loader.conf.

Can anyone clue me into what's happening here, or suggest some further
troubleshooting that will help me gain some insight?

Thanks!


More information about the freebsd-questions mailing list