zpool CKSUM 0 --> 1 while resilvering
David Christensen
dpchrist at holgerdanske.com
Mon Feb 15 20:56:40 UTC 2021
On 2021-02-15 10:11, Janos Dohanics wrote:
> Hello,
>
> I had to replace a hard drive which has failed a smart test. As the
> replacement hard drive was being resilvered, I have peridically checked
> the progress.
>
> Initially, everything looked fine:
>
> # zpool status
> pool: zroot
> state: ONLINE
> status: One or more devices is currently being resilvered. The pool will
> continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scan: resilver in progress since Sun Feb 14 20:44:20 2021
> 1.77T scanned at 291M/s, 179G issued at 28.7M/s, 1.97T total
> 179G resilvered, 8.88% done, 0 days 18:12:23 to go
> config:
>
> NAME STATE READ WRITE CKSUM
> zroot ONLINE 0 0 0
> mirror-0 ONLINE 0 0 0
> ada0p3 ONLINE 0 0 0
> replacing-1 ONLINE 0 0 0
> ada1p3 ONLINE 0 0 0
> ada3p3 ONLINE 0 0 0
> ada2p3 ONLINE 0 0 0
>
> errors: No known data errors
>
> But a little later CKSUM changed from 0 to 1:
>
> # zpool status
> pool: zroot
> state: ONLINE
> status: One or more devices is currently being resilvered. The pool will
> continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scan: resilver in progress since Sun Feb 14 20:44:20 2021
> 1.84T scanned at 249M/s, 275G issued at 36.5M/s, 1.97T total
> 275G resilvered, 13.65% done, 0 days 13:35:00 to go
> config:
>
> NAME STATE READ WRITE CKSUM
> zroot ONLINE 0 0 0
> mirror-0 ONLINE 0 0 0
> ada0p3 ONLINE 0 0 0
> replacing-1 ONLINE 0 0 0
> ada1p3 ONLINE 0 0 0
> ada3p3 ONLINE 0 0 1
> ada2p3 ONLINE 0 0 0
>
> errors: No known data errors
>
> After resilvering has finished:
>
> # zpool status
> pool: zroot
> state: ONLINE
> status: One or more devices has experienced an unrecoverable error. An
> attempt was made to correct the error. Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using 'zpool clear' or replace the device with 'zpool replace'.
> see: http://illumos.org/msg/ZFS-8000-9P
> scan: resilvered 1.97T in 0 days 08:25:27 with 0 errors on Mon Feb 15 05:09:47 2021
> config:
>
> NAME STATE READ WRITE CKSUM
> zroot ONLINE 0 0 0
> mirror-0 ONLINE 0 0 0
> ada0p3 ONLINE 0 0 0
> ada3p3 ONLINE 0 0 1
> ada2p3 ONLINE 0 0 0
>
> errors: No known data errors
>
> No errors reported by smartctl(8) for /dev/ada3.
>
> Can I consider this a harmless error and should I just run "zpool clear
> ada3p3'?
>
> Please advise.
STFW 'zpool status cksum':
https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gbcve/index.html
https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gbbzs/index.html
I would:
1. Check for interface, cable, and/or rack errors. These should
generate error messages in dmesg(8) and /var/log/messages. I especially
hate red, non-locking SATA cables without any speed marking. More than
a few people agree that the red dye in the insulation will corrode
copper. I have replaced all of my SATA cables with new black, locking,
6 Gbps SATA cables (made by Cable Matters).
2. If and when all of the above is okay, I would run SMART short and
long tests on all three drives. (I used to believe in manufacturer
diagnostic tools, but recent experiences with Seagate and Western
Digital have convinced me otherwise; notably when following up with
technical support. If anyone knows of a brand of HDD with good drives,
good diagnostic tools, and good technical support, please advise.)
3. If and when all of the above is okay, I would scrub the pool.
4. If and when all of the above is okay and CKSUM remains (#3 might
clear it?), I would do the 'zpool clear ...'.
David
More information about the freebsd-questions
mailing list