a strange and terrible saga of the cursed iSCSI ZFS SAN

Sat Aug 5 17:52:36 UTC 2017

"Eugene M. Zheganin" <emz at norma.perm.ru> wrote:

> On 05.08.2017 22:08, Eugene M. Zheganin wrote:
> >
> >   pool: userdata
> >  state: ONLINE
> > status: One or more devices has experienced an error resulting in data
> >         corruption.  Applications may be affected.
> > action: Restore the file in question if possible.  Otherwise restore the
> >         entire pool from backup.
> >    see: http://illumos.org/msg/ZFS-8000-8A
> >   scan: none requested
> > config:
> >
> >         NAME               STATE     READ WRITE CKSUM
> >         userdata           ONLINE       0     0  216K
> >           mirror-0         ONLINE       0     0  432K
> >             gpt/userdata0  ONLINE       0     0  432K
> >             gpt/userdata1  ONLINE       0     0  432K  
> That would be funny, if not that sad, but while writing this message, 
> the pool started to look like below (I just asked zpool status twice in 
> a row, comparing to what it was):
> 
> [root at san1:~]# zpool status userdata
>    pool: userdata
>   state: ONLINE
> status: One or more devices has experienced an error resulting in data
>          corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>          entire pool from backup.
>     see: http://illumos.org/msg/ZFS-8000-8A
>    scan: none requested
> config:
> 
>          NAME               STATE     READ WRITE CKSUM
>          userdata           ONLINE       0     0  728K
>            mirror-0         ONLINE       0     0 1,42M
>              gpt/userdata0  ONLINE       0     0 1,42M
>              gpt/userdata1  ONLINE       0     0 1,42M
> 
> errors: 4 data errors, use '-v' for a list
> [root at san1:~]# zpool status userdata
>    pool: userdata
>   state: ONLINE
> status: One or more devices has experienced an error resulting in data
>          corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>          entire pool from backup.
>     see: http://illumos.org/msg/ZFS-8000-8A
>    scan: none requested
> config:
> 
>          NAME               STATE     READ WRITE CKSUM
>          userdata           ONLINE       0     0  730K
>            mirror-0         ONLINE       0     0 1,43M
>              gpt/userdata0  ONLINE       0     0 1,43M
>              gpt/userdata1  ONLINE       0     0 1,43M
> 
> errors: 4 data errors, use '-v' for a list
> 
> So, you see, the error rate is like speed of light. And I'm not sure if 
> the data access rate is that enormous, looks like they are increasing on 
> their own.
> So may be someone have an idea on what this really means.

Quoting a comment from sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:
/*
 * If destroy encounters an EIO while reading metadata (e.g. indirect
 * blocks), space referenced by the missing metadata can not be freed.
 * Normally this causes the background destroy to become "stalled", as
 * it is unable to make forward progress.  While in this stalled state,
 * all remaining space to free from the error-encountering filesystem is
 * "temporarily leaked".  Set this flag to cause it to ignore the EIO,
 * permanently leak the space from indirect blocks that can not be read,
 * and continue to free everything else that it can.
 *
 * The default, "stalling" behavior is useful if the storage partially
 * fails (i.e. some but not all i/os fail), and then later recovers.  In
 * this case, we will be able to continue pool operations while it is
 * partially failed, and when it recovers, we can continue to free the
 * space, with no leaks.  However, note that this case is actually
 * fairly rare.
 *
 * Typically pools either (a) fail completely (but perhaps temporarily,
 * e.g. a top-level vdev going offline), or (b) have localized,
 * permanent errors (e.g. disk returns the wrong data due to bit flip or
 * firmware bug).  In case (a), this setting does not matter because the
 * pool will be suspended and the sync thread will not be able to make
 * forward progress regardless.  In case (b), because the error is
 * permanent, the best we can do is leak the minimum amount of space,
 * which is what setting this flag will do.  Therefore, it is reasonable
 * for this flag to normally be set, but we chose the more conservative
 * approach of not setting it, so that there is no possibility of
 * leaking space in the "partial temporary" failure case.
 */

In FreeBSD the "flag" currently isn't easily reachable due to the lack
of a powerful kernel debugger (like mdb in Solaris offsprings) but
it can be made reachable with a sysctl using the patch from:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218954

Fabian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20170805/499f28b3/attachment.sig>