Re: ZFS: Rescue FAULTED Pool
- Reply: Dennis Clarke : "Re: ZFS: Rescue FAULTED Pool"
- Reply: Andriy Gapon : "Re: ZFS: Rescue FAULTED Pool"
- In reply to: Allan Jude : "Re: ZFS: Rescue FAULTED Pool"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 01 Feb 2025 08:57:15 UTC
Am Thu, 30 Jan 2025 16:13:56 -0500
Allan Jude <allanjude@freebsd.org> schrieb:
> On 1/30/2025 6:35 AM, A FreeBSD User wrote:
> > Am Wed, 29 Jan 2025 03:45:25 -0800
> > David Wolfskill <david@catwhisker.org> schrieb:
> >
> > Hello, thanks for responding.
> >
> >> On Wed, Jan 29, 2025 at 11:27:01AM +0100, FreeBSD User wrote:
> >>> Hello,
> >>>
> >>> a ZFS pool (RAINDZ(1)) has been faulted. The pool is not importable
> >>> anymore. neither with import -F/-f.
> >>> Although this pool is on an experimental system (no backup available)
> >>> it contains some data to reconstruct them would take a while, so I'd
> >>> like to ask whether there is a way to try to "de-fault" such a pool.
> >>
> >> Well, 'zpool clear ...' "Clears device errors in a pool." (from "man
> >> zpool".
> >>
> >> It is, however, not magic -- it doesn't actually fix anything.
> >
> > For the record: I tried EVERY network/search available method useful for common
> > "administrators", but hoped people are abe to manipulate deeper stuff via zdb ...
> >
> >>
> >> (I had an issue with a zpool which had a single SSD device as a ZIL; the
> >> ZIL device failed after it had accepted some data to be written to the
> >> pool, but before the data could be read and transferred to the spinning
> >> disks. ZFS was quite unhappy about that. I was eventually able to copy
> >> the data elsewhere, destroy the old zpool, recreate it *without* that
> >> single point of failure, then copy the data back. And I learned to
> >> never create a zpool with a *single* device as a separate ZIL.)
> >
> > Well, in this case I do not use dedicated ZIL drives. I also made several experiences with
> > "single" ZIL drive setups, but a dedicated ZIL is mostly useful in cases were you have
> > graveyard full of inertia-suffering, mass-spinning HDDs - if I'm right the concept of SSD
> > based ZIL would be of no use/effect in that case. So I ommited tose.
> >
> >>
> >>> The pool is comprised from 7 drives as a RAIDZ1, one of the SSDs
> >>> faulted but I pulled the wrong one, so the pool ran into suspended
> >>> state.
> >>
> >> Can you put the drive you pulled back in?
> >
> > Every single SSD originally plugged in is now back in place, even the faulted one (which
> > doesn't report any faults at the moment).
> >
> > Although the pool isn't "importable", zdb reports its existence, amongst zroot (which
> > resides on a dedicated drive).
> >
> >>
> >>> The host is running the lates Xigmanas BETA, which is effectively
> >>> FreeBSD 14.1-p2, just for the record.
> >>>
> >>> I do not want to give up, since I hoped there might be a rude but
> >>> effective way to restore the pool even under datalosses ...
> >>>
> >>> Thanks in advance,
> >>>
> >>> Oliver
> >>> ....
> >>
> >> Good luck!
> >>
> >> Peace,
> >> david
> >
> >
> > Well, this is a hard and painful lecture to learn, if there is no chance to get back the
> > pool.
> >
> > A warning (but this seems to be useless in the realm of professionals): I used a bunch of
> > cheap spotmarket SATA SSDs, a brand called "Intenso" common also here in Good old Germany.
> > Some of those SSDs do have working LED when used with a Fujitsu SAS HBA controller - but
> > those died very quickly from suffering some bus errors. Another bunch of those SSDs do not
> > have working LED (not blinking on access), but lasted a bit longer. The problem with those
> > SSDs is: I can not find the failing device easily by accessing the failed drive by writing
> > massive data via dd, if possible.
> > I also ordered alternative SSDs from a more expensive brand - but bad Karma ...
> >
> > Oliver
> >
> >
>
> The most useful thing to share right now would be the output of `zpool
> import` (with no pool name) on the rebooted system.
>
> That will show where the issues are, and suggest how they might be solved.
>
Hello, this exactly happens when trying to import the pool. Prior to the loss, device da1p1
has been faulted with numbers in the colum/columns "corrupted data"/further not seen now.
~# zpool import
pool: BUNKER00
id: XXXXXXXXXXXXXXXXXXXX
state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-72
config:
BUNKER00 FAULTED corrupted data
raidz1-0 ONLINE
da2p1 ONLINE
da3p1 ONLINE
da4p1 ONLINE
da7p1 ONLINE
da6p1 ONLINE
da1p1 ONLINE
da5p1 ONLINE
~# zpool import -f BUNKER00
cannot import 'BUNKER00': I/O error
Destroy and re-create the pool from
a backup source.
~# zpool import -F BUNKER00
cannot import 'BUNKER00': one or more devices is currently unavailable
--
A FreeBSD user