Resolving errors with ZVOL-s
Wiktor Niesiobedzki
bsd at vink.pl
Mon Sep 4 17:12:54 UTC 2017
Hi,
I can follow up on my issue - the same problem just happened on the second
ZVOL that I've created:
# zpool status -v
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 5h27m with 0 errors on Sat Sep 2 15:30:59 2017
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 14
mirror-0 ONLINE 0 0 28
gpt/tank1.eli ONLINE 0 0 28
gpt/tank2.eli ONLINE 0 0 28
errors: Permanent errors have been detected in the following files:
tank/docker-big:<0x1>
<0x5095>:<0x1>
I suspect that these errors might be related to my recent upgrade to 11.1.
Until 19 of August I was running 11.0. I consider rolling back to 11.0
right now.
For reference:
# zfs get all tank/docker-big
NAME PROPERTY VALUE SOURCE
tank/docker-big type volume -
tank/docker-big creation Sat Sep 2 10:09 2017 -
tank/docker-big used 100G -
tank/docker-big available 747G -
tank/docker-big referenced 10.5G -
tank/docker-big compressratio 4.58x -
tank/docker-big reservation none default
tank/docker-big volsize 100G local
tank/docker-big volblocksize 128K -
tank/docker-big checksum skein inherited
from tank
tank/docker-big compression lz4 inherited
from tank
tank/docker-big readonly off default
tank/docker-big copies 1 default
tank/docker-big refreservation 100G local
tank/docker-big primarycache all default
tank/docker-big secondarycache all default
tank/docker-big usedbysnapshots 0 -
tank/docker-big usedbydataset 10.5G -
tank/docker-big usedbychildren 0 -
tank/docker-big usedbyrefreservation 89.7G -
tank/docker-big logbias latency default
tank/docker-big dedup off default
tank/docker-big mlslabel -
tank/docker-big sync standard default
tank/docker-big refcompressratio 4.58x -
tank/docker-big written 10.5G -
tank/docker-big logicalused 47.8G -
tank/docker-big logicalreferenced 47.8G -
tank/docker-big volmode dev local
tank/docker-big snapshot_limit none default
tank/docker-big snapshot_count none default
tank/docker-big redundant_metadata all default
tank/docker-big com.sun:auto-snapshot false local
Any ideas what should I try before rolling back?
Cheers,
Wiktor
2017-09-02 19:17 GMT+02:00 Wiktor Niesiobedzki <bsd at vink.pl>:
> Hi,
>
> I have recently encountered errors on my ZFS Pool on my 11.1-R:
> $ uname -a
> FreeBSD kadlubek 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9
> 11:55:48 UTC 2017 root at amd64-builder.daemonology
> .net:/usr/obj/usr/src/sys/GENERIC amd64
>
> # zpool status -v tank
> pool: tank
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://illumos.org/msg/ZFS-8000-8A
> scan: scrub repaired 0 in 5h27m with 0 errors on Sat Sep 2 15:30:59 2017
> config:
>
> NAME STATE READ WRITE CKSUM
> tank ONLINE 0 0 98
> mirror-0 ONLINE 0 0 196
> gpt/tank1.eli ONLINE 0 0 196
> gpt/tank2.eli ONLINE 0 0 196
>
> errors: Permanent errors have been detected in the following files:
>
> dkr-test:<0x1>
>
> dkr-test is ZVOL that I use within bhyve and indeed - within bhyve I have
> noticed I/O errors on this volume. This ZVOL did not have any snapshots.
>
> Following the advice mentioned in action I tried to restore the ZVOL:
> # zfs desroy tank/dkr-test
>
> But still errors are mentioned in zpool status:
> errors: Permanent errors have been detected in the following files:
>
> <0x5095>:<0x1>
>
> I can't find any reference to this dataset in zdb:
> # zdb -d tank | grep 5095
> # zdb -d tank | grep 20629
>
>
> I tried also getting statistics about metadata in this pool:
> # zdb -b tank
>
> Traversing all blocks to verify nothing leaked ...
>
> loading space map for vdev 0 of 1, metaslab 159 of 174 ...
> No leaks (block sum matches space maps exactly)
>
> bp count: 24426601
> ganged count: 0
> bp logical: 1983127334912 avg: 81187
> bp physical: 1817897247232 avg: 74422 compression:
> 1.09
> bp allocated: 1820446928896 avg: 74527 compression:
> 1.09
> bp deduped: 0 ref>1: 0 deduplication: 1.00
> SPA allocated: 1820446928896 used: 60.90%
>
> additional, non-pointer bps of type 0: 57981
> Dittoed blocks on same vdev: 296490
>
> And zdb got stuck using 100% CPU
>
> And now to my questions:
> 1. Do I interpret correctly, that this situation is probably due to error
> during write, and both copies of the block got checksum mismatching their
> data? And if it is a hardware problem, it is probably something other than
> disk? (No, I don't use ECC RAM)
>
> 2. Is there any way to remove offending dataset and clean the pool of the
> errors?
>
> 3. Is my metadata OK? Or should I restore entire pool from backup?
>
> 4. I tried also running zdb -bc tank, but this resulted in kernel panic. I
> might try to get the stack trace once I get physical access to machine next
> week. Also - checksum verification slows down process from 1000MB/s to less
> than 1MB/s. Is this expected?
>
> 5. When I work with zdb (as as above) should I try to limit writes to the
> pool (e.g. by unmounting the datasets)?
>
> Cheers,
>
> Wiktor Niesiobedzki
>
> PS. Sorry for previous partial message.
>
>
More information about the freebsd-fs
mailing list