misc/177966: [zfs] resilver completes but subsequent scrub reports errors

Thu Apr 18 18:30:02 UTC 2013

>Number:         177966
>Category:       misc
>Synopsis:       [zfs] resilver completes but subsequent scrub reports errors
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Apr 18 18:30:00 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator:     Nathaniel Filardo
>Release:        9.1-STABLE
>Organization:
>Environment:
FreeBSD hydra.priv.oc.ietfng.org 9.1-STABLE FreeBSD 9.1-STABLE #39 r+39eb5ca-dirty: Fri Apr  5 10:46:04 EDT 2013     root at hydra.priv.oc.ietfng.org:/usr/obj/systank/src-git/sys/NWFKERN  sparc64

>Description:
I took one disk out of a raidz2 pool, and proceeded to run the system for a while on a degraded configuration (but still with redundancy).  I then replaced the missing disk (with zpool replace rather than zpool online) and let the system run resilver to completion.  It succeeded and reported no errors.  Having had bad experiences in the past (http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016627.html) I ran scrub, which reported 11 checksum errors on the replaced drive, very clearly during the part of the scrub which was walking refcount > 1 blocks.  I am currently running another scrub pass, which I hypothesize will succeed without error.

The pool, under normal circumstances, looks like this:

        NAME        STATE     READ WRITE CKSUM
        tank0       ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada9    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
        cache
          ada1a     ONLINE       0     0     0
          ada0b     ONLINE       0     0     0

The pool configuration is pretty default, except that it uses 4K sectors (ashift=12) and the following options are set:

tank0  checksum              sha256                 received
tank0  compression           gzip                   received
tank0  atime                 off                    received
tank0  dedup                 sha256,verify          received

The deduplication table is pretty sizable:

dedup: DDT entries 16754758, size 981 on disk, 158 in core
bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    13.0M   1.33T   1.24T   1.27T    13.0M   1.33T   1.24T   1.27T
     2    2.35M    198G    165G    172G    5.18M    430G    361G    378G
     4     495K   25.4G   13.4G   16.1G    2.24M    114G   61.0G   73.8G
     8     121K   1.60G    689M   1.48G    1.28M   16.3G   6.78G   15.5G
    16    22.1K    250M    116M    269M     469K   5.04G   2.31G   5.48G
    32    4.11K    157M    138M    159M     195K   8.45G   7.65G   8.59G
    64    1.53K   9.76M   3.99M   14.8M     124K    897M    375M   1.22G
   128      254   6.49M   2.89M   4.60M    41.8K    949M    427M    717M
   256       58    582K    100K    519K    19.6K    181M   34.3M    175M
   512       27    540K     26K    232K    19.0K    482M   20.7M    167M
    1K       12      6K      6K   95.9K    17.9K   8.94M   8.94M    143M
    2K        8    648K   13.5K   71.9K    19.9K   1.42G   34.4M    181M
    4K        3    256K    129K    144K    17.6K   1.38G    764M    851M
    8K       12    644K   8.50K   95.9K     149K   8.97G    110M   1.16G
 Total    16.0M   1.55T   1.42T   1.45T    22.7M   1.90T   1.67T   1.74T

Full DSL scans (scrub, resilver) take about 48 hours each, the first half of which is spent in an incredibly annoyingly slow scan (currently moving about 20 iops/sec and 1Mb/sec) as it works its way through the DDT entries with refcount > 1, after which it ramps up to 35MB/sec as it traverses refcount=1 blocks in disk order.

In any case, the scrub after the resilver was clearly in the first such phase of its scan and reported 11 checksum errors all at once (more or less).  There were no checksum errors found in the second (refcount=1) phase.

If I have to guess, this is possibly a bug in the code which handles entries in the DDT changing their class while a scrub is in progress.
>How-To-Repeat:
It appears sufficient to be performing I/O traffic to a resilvering pool with deduplication.  I will attempt to repeat the experiment as soon as this scrub pass finishes successfully; if it instead finds errors, I will run scrub again.
>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted: