ZFS i/o error on boot unable to start system

Karl Denninger karl at denninger.net
Mon Mar 2 15:58:59 UTC 2020

On 3/2/2020 09:31, mike tancsa wrote:
> On 2/28/2020 8:51 AM, James B. Byrne via freebsd-questions wrote:
>> I have reported this on the forums as well.
>> FreeBSD-12.1p2
>> raidz2 on 4x8TB HDD (reds)
>> root on zfs
>> We did a hot restart of this host this morning and received the following on
>> the console:
>> ZFS: i/o error - all block copies unavailable
>> ZFS: failed to read pool zroot directory object
>> qptzfsboot: failed to mount default pool zroot
>> What has happened?  How do I get this system back up and online?
> Could be a number of things. e.g. you might have only installed the boot
> block on one disk and thats no longer in the boot order.  
Well, no.  The error is coming from the loader (otherwise there's no
ability to try to read ZFS at all.)  "gptzfsboot" implies legacy (not
EFI) booting.
> Your BIOS
> might have changed from legacy to EFI only and all of a sudden you
> cannot boot the disks as you dont have the right boot loader. (This
> happened to me once). 

Now THAT is possible and in fact sort of likely.

Let's say you have 4 disks, all of which are part of the pool in
question.  When you built the system all the disks got the loader.  Now
you upgrade the system through time and during one or more of these
cycles the pool gets new flag(s) which get turned on.  The upgrade also
includes the loader, but SOME of the disks don't get it updated.

When the system boots up until either the EFI loader or gptzfsboot
executes only ONE copy is used -- the one the BIOS decides to load,
which is under control of the BIOS' idea of what the boot order is. 
Further, if the pool is encrypted then the loader (gptzfsboot or the EFI
loader) has to know how to prompt for the key and read that too, and
which disks/partitions to try to unlock before attempting to taste them
(e.g. which have the "boot" flag turned on.)

Now, for whatever reason, the BIOS decides to read the disks in a
different order and it loads and starts the OLD loader, which was not
updated, off the disk it selects.  That loader can't read ANY of the
disks that have had the new flags set active, or it can't read an
encrypted disk, etc -- and thus you get the error.

> As others have said, try booting off a usb stick and see if the pool is
> OK and then take a look at the parts that are involved in the boot
> process as outlined by trond.endrestol at ximalas.info
Yes.  It is likely the issue is that the loader was updated on some but
not all disks but BE CAREFUL because writing that, if done wrong (e.g.
to the wrong partition, etc), can destroy the data on the drive.
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20200302/ed9b3588/attachment.bin>

More information about the freebsd-questions mailing list