Re: zpool import: "The pool cannot be imported due to damaged devices or data" but zpool status -x: "all pools are healthy" and zpool destroy: "no such pool"

From: Mark Millard via freebsd-current <freebsd-current_at_freebsd.org>
Date: Thu, 16 Sep 2021 22:50:17 UTC

On 2021-Sep-16, at 15:16, Alan Somers <asomers at freebsd.org> wrote:

> On Thu, Sep 16, 2021 at 4:02 PM Mark Millard <marklmi at yahoo.com> wrote:
> 
> 
> On 2021-Sep-16, at 13:39, Alan Somers <asomers at freebsd.org> wrote:
> 
> > On Thu, Sep 16, 2021 at 2:04 PM Mark Millard via freebsd-current <freebsd-current@freebsd.org> wrote:
> > What do I go about:
> > 
> > QUOTE
> > # zpool import
> >    pool: zopt0
> >      id: 18166787938870325966
> >   state: FAULTED
> > status: One or more devices contains corrupted data.
> >  action: The pool cannot be imported due to damaged devices or data.
> >    see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
> >  config:
> > 
> >         zopt0       FAULTED  corrupted data
> >           nda0p2    UNAVAIL  corrupted data
> > 
> > # zpool status -x
> > all pools are healthy
> > 
> > # zpool destroy zopt0
> > cannot open 'zopt0': no such pool
> > END QUOTE
> > 
> > (I had attempted to clean out the old zfs context on
> > the media and delete/replace the 2 freebsd swap
> > partitions and 1 freebsd-zfs partition, leaving the
> > efi partition in place. Clearly I did not do everything
> > require [or something is very wrong]. zopt0 had been
> > a root-on-ZFS context and would be again. I have a
> > backup of the context to send/receive once the pool
> > in the partition is established.)
> > 
> > For reference, as things now are:
> > 
> > # gpart show
> > =>       40  937703008  nda0  GPT  (447G)
> >          40     532480     1  efi  (260M)
> >      532520       2008        - free -  (1.0M)
> >      534528  937166848     2  freebsd-zfs  (447G)
> >   937701376       1672        - free -  (836K)
> > . . .
> > 
> > (That is not how it looked before I started.)
> > 
> > # uname -apKU
> > FreeBSD CA72_4c8G_ZFS 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #4 releng/13.0-n244760-940681634ee1-dirty: Mon Aug 30 11:35:45 PDT 2021     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1300139 1300139
> > 
> > I have also tried under:
> > 
> > # uname -apKU
> > FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #12 main-n249019-0637070b5bca-dirty: Tue Aug 31 02:24:20 PDT 2021     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400032 1400032
> > 
> > after reaching this state. It behaves the same.
> > 
> > The text presented by:
> > 
> > https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
> > 
> > does not deal with what is happening overall.
> > 
> > So you just want to clean nda0p2 in order to reuse it?  Do "zpool labelclear -f /dev/nda0p2"
> > 
>> 
>> I did not extract and show everything that I'd tried but
>> there were examples of:
>> 
>> # zpool labelclear -f /dev/nda0p2
>> failed to clear label for /dev/nda0p2
>> 
>> from when I'd tried such. So far I've not
>> identified anything with official commands
>> to deal with the issue.
>> 
> That is the correct command to run.  However, the OpenZFS import in FreeBSD 13.0 brought in a regression in that command.  It wasn't a code bug really, more like a UI bug.  OpenZFS just had a less useful labelclear command than FreeBSD did.  The regression has now been fixed upstream.
> https://github.com/openzfs/zfs/pull/12511

Cool.

>> Ultimately I zeroed out areas of the media that
>> happened to span the zfs related labels. After
>> that things returned to normal. I'd still like
>> to know a supported way of dealing with the
>> issue.
>> 
>> The page at the URL it listed just says:
>> 
>> QUOTE
>> The pool must be destroyed and recreated from an appropriate backup source
>> END QUOTE
> 
> It advised to to "destroy and recreate" the pool because you ran "zpool import", so ZFS thought that you actually wanted to import the pool.  The error message is appropriate if that had been the case.

The start of the problem looked like (console context,
so messages interlaced):

# zpool create -O compress=lz4 -O atime=off -f -tzopt0 zpopt0 /dev/nvd0
GEOM: nda0: the primary GPT table is corrupt or invalid.
GEOM: nda0: using the secondary instead -- recovery strongly advised.
cannot create 'zpopt0': no such pool or dataset
# Sep 16 12:19:31 CA72_4c8G_ZFS ZFS[1111]: vdev problem, zpool=zopt0 path=/dev/nvd0 type=ereport.fs.zfs.vdev.open_failed

The GPT table was okay just prior to the command.
So I recovered it.

The import was the only command that I tried that
referenced what to do about what was being reported.
(Not that it was useful for my context.) I discovered
the zpool status via the import reporting what it did
after doing the GPT recovery first.

I've still no clue what was wrong with my labelclear
before the repartitioning. But it appeared that the
GPT tables and the zfs related labels were stomping
on each other after the reparitioning.

So, yes, I was trying to import when I first got the
message in question. But I could not do as indated
and it reported to do a type of activity that I could
not do. That was confusing.


>> But the official destroy commands did not work:
>> 
>> Because "zpool destroy" only works for imported pools.  The error message meant "destroy" in a more generic sense.
>>  
>> same sort of issue of reporting that nothing
>> appropriate was found to destroy and no way to
>> import the problematical pool.
>> 
>> 
>> Note: I use ZFS because of wanting to use bectl, not
>> for redundancy or such. So the configuration is very
>> simple.




===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)