zpool CKSUM 0 --> 1 while resilvering

David Christensen dpchrist at holgerdanske.com
Mon Feb 15 20:56:40 UTC 2021


On 2021-02-15 10:11, Janos Dohanics wrote:
> Hello,
> 
> I had to replace a hard drive which has failed a smart test. As the
> replacement hard drive was being resilvered, I have peridically checked
> the progress.
> 
> Initially, everything looked fine:
> 
> # zpool status
>    pool: zroot
>   state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
> 	continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>    scan: resilver in progress since Sun Feb 14 20:44:20 2021
> 	1.77T scanned at 291M/s, 179G issued at 28.7M/s, 1.97T total
> 	179G resilvered, 8.88% done, 0 days 18:12:23 to go
> config:
> 
> 	NAME             STATE     READ WRITE CKSUM
> 	zroot            ONLINE       0     0     0
> 	  mirror-0       ONLINE       0     0     0
> 	    ada0p3       ONLINE       0     0     0
> 	    replacing-1  ONLINE       0     0     0
> 	      ada1p3     ONLINE       0     0     0
> 	      ada3p3     ONLINE       0     0     0
> 	    ada2p3       ONLINE       0     0     0
> 
> errors: No known data errors
> 
> But a little later CKSUM changed from 0 to 1:
> 
> # zpool status
>    pool: zroot
>   state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
> 	continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>    scan: resilver in progress since Sun Feb 14 20:44:20 2021
> 	1.84T scanned at 249M/s, 275G issued at 36.5M/s, 1.97T total
> 	275G resilvered, 13.65% done, 0 days 13:35:00 to go
> config:
> 
> 	NAME             STATE     READ WRITE CKSUM
> 	zroot            ONLINE       0     0     0
> 	  mirror-0       ONLINE       0     0     0
> 	    ada0p3       ONLINE       0     0     0
> 	    replacing-1  ONLINE       0     0     0
> 	      ada1p3     ONLINE       0     0     0
> 	      ada3p3     ONLINE       0     0     1
> 	    ada2p3       ONLINE       0     0     0
> 
> errors: No known data errors
> 
> After resilvering has finished:
> 
> # zpool status
>    pool: zroot
>   state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
> 	attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> 	using 'zpool clear' or replace the device with 'zpool replace'.
>     see: http://illumos.org/msg/ZFS-8000-9P
>    scan: resilvered 1.97T in 0 days 08:25:27 with 0 errors on Mon Feb 15 05:09:47 2021
> config:
> 
> 	NAME        STATE     READ WRITE CKSUM
> 	zroot       ONLINE       0     0     0
> 	  mirror-0  ONLINE       0     0     0
> 	    ada0p3  ONLINE       0     0     0
> 	    ada3p3  ONLINE       0     0     1
> 	    ada2p3  ONLINE       0     0     0
> 
> errors: No known data errors
> 
> No errors reported by smartctl(8) for /dev/ada3.
> 
> Can I consider this a harmless error and should I just run "zpool clear
> ada3p3'?
> 
> Please advise.


STFW 'zpool status cksum':

https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gbcve/index.html

https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gbbzs/index.html


I would:

1.  Check for interface, cable, and/or rack errors.  These should 
generate error messages in dmesg(8) and /var/log/messages.  I especially 
hate red, non-locking SATA cables without any speed marking.  More than 
a few people agree that the red dye in the insulation will corrode 
copper.  I have replaced all of my  SATA cables with new black, locking, 
6 Gbps SATA cables (made by Cable Matters).

2.  If and when all of the above is okay, I would run SMART short and 
long tests on all three drives.  (I used to believe in manufacturer 
diagnostic tools, but recent experiences with Seagate and Western 
Digital have convinced me otherwise; notably when following up with 
technical support.  If anyone knows of a brand of HDD with good drives, 
good diagnostic tools, and good technical support, please advise.)

3.  If and when all of the above is okay, I would scrub the pool.

4.  If and when all of the above is okay and CKSUM remains (#3 might 
clear it?), I would do the 'zpool clear ...'.


David


More information about the freebsd-questions mailing list