Re: ZFS: zpool status on degraded pools (FreeBSD12 vs FreeBSD13)

From: Karl Denninger <karl_at_denninger.net>
Date: Wed, 14 Jul 2021 18:17:05 -0400
On 7/14/2021 17:45, Dave Baukus wrote:
> On 7/14/21 3:21 PM, Alan Somers wrote:
> This message originated outside your organization.
> ________________________________
> On Wed, Jul 14, 2021 at 3:10 PM Dave Baukus <daveb_at_spectralogic.com<mailto:daveb_at_spectralogic.com>> wrote:
> I'm seeking comments on the following 2 difference in the behavior of ZFS.
> The first, I consider a bug; the second could be a bug or a conscious choice:
>
> 1) Given a pool of 2 disks and one extra disk exactly the same as the 2 pool members (no ZFS labels on the extra disk),
> power the box off, replace one pool disk with extra disk in the same location; power box back on.
>
> The pool is state on FreeBSD13 is ONLINE vs DEGRADED on FreeBSD12:
>
> I agree, the FreeBSD 13 behavior seems like a bug.
>
> 2.) Add a spare to a degraded pool and issue a zpool replace to activate the spare.
> On FreeBSD13 after the resilver is complete, the pool remains degraded until the degraded disk
> is removed via zpool detach; on Freebsd12, the pool becomes ONLINE when the resilver is complete:
>
> I agree.  I think I prefer the FreeBSD 13 behavior, but either way is sensible.
>
> The change is no doubt due to the OpenZFS import in FreeBSD 13.  Have you tried to determine the responsible commits?  They could be regressions in OpenZFS, or they could be bugs that we fixed in FreeBSD but never upstreamed.
> -Alan
>
> Thanks for the feedback Alan. I have not yet dug into #1 beyond zpool, lib[zpool|zfs].
>
> --
>
> Dave Baukus

IMHO.... (12.2-STABLE)

root_at_NewFS:/home/karl # zpool status backup
   pool: backup
  state: DEGRADED
status: One or more devices has been taken offline by the administrator.
         Sufficient replicas exist for the pool to continue functioning in a
         degraded state.
action: Online the device using 'zpool online' or replace the device with
         'zpool replace'.
   scan: scrub repaired 0 in 0 days 09:25:28 with 0 errors on Wed Jun 30 
12:33:35 2021
config:

         NAME                     STATE     READ WRITE CKSUM
         backup                   DEGRADED     0     0     0
           mirror-0               DEGRADED     0     0     0
             gpt/backup8.eli      ONLINE       0     0     0
             9628424513629875622  OFFLINE      0     0     0  was 
/dev/gpt/backup8-1.eli
             gpt/backup8-2.eli    ONLINE       0     0     0

errors: No known data errors

This is IMHO correct behavior.  I do this intentionally; the other disk 
is physically offsite.  When I go to swap them I take 8-2 offline, 
remove it, go swap it with 8-1, bring 8-1 online. The mirror resilvers 
but, when its done it still shows  "DEGRADED" because it is.  It has 
three members and one was (deliberately) removed and is not in the building.

This is the last-ditch, building-burned-down (or similar catastrophe) 
offsite backup of course.  That pool is normally exported except when 
synchronizing using zfs send/recv.

If one of the other two fails (they are subject to a routine scrub 
schedule) then when a do a "replace" on it, when it finishes, the pool 
is *still* degraded.  The only time it would not be is if all three 
disks are physically in the machine at once, which is not something I 
usually do for obvious reasons. The exception is when I need to make 
that pool larger; then they all have to be here since all three members 
have to be present and online for the expand to work.

So long as at least *one* of the three mirror members has not been 
destroyed/damaged and is intact I still have a fully-functional backup 
from which the running system data sets can be restored.

If -13 would show that configuration "ONLINE" then IMHO what it is 
reporting is broken; there is a missing member in the mirror set, albeit 
in this case intentionally.

-- 

Karl Denninger
karl_at_denninger.net <mailto:karl_at_denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
Received on Wed Jul 14 2021 - 22:17:05 UTC

Original text of this message