ZFS-8000-8A: assistance needed

Thu Sep 8 07:45:40 UTC 2016

Hi Ruslan,

Il 06/09/2016 22:00, Ruslan Makhmatkhanov ha scritto:
> Hello,
>
> I've got something new here and just not sure where to start on 
> solving that. It's on 10.2-RELEASE-p7 amd64.
>
> """
> root:~ # zpool status -xv
>   pool: storage_ssd
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>     corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>     entire pool from backup.
>    see: http://illumos.org/msg/ZFS-8000-8A
>   scan: scrub repaired 0 in 0h26m with 5 errors on Tue Aug 23 00:40:24 
> 2016
> config:
>
>     NAME              STATE     READ WRITE CKSUM
>     storage_ssd       ONLINE       0     0 59.3K
>       mirror-0        ONLINE       0     0     0
>         gpt/drive-06  ONLINE       0     0     0
>         gpt/drive-07  ONLINE       0     0     9
>       mirror-1        ONLINE       0     0  119K
>         gpt/drive-08  ONLINE       0     0  119K
>         gpt/drive-09  ONLINE       0     0  119K
>     cache
>       mfid5           ONLINE       0     0     0
>       mfid6           ONLINE       0     0     0
>
> errors: Permanent errors have been detected in the following files:
>
>         <0x1bd0a>:<0x8>
>         <0x31f23>:<0x8>
>         /storage_ssd/f262f6ebaf5011e39ca7047d7bb28f4a/disk
>         /storage_ssd/7ba3f661fa9811e3bd9d047d7bb28f4a/disk
>         /storage_ssd/2751d305ecba11e3aef0047d7bb28f4a/disk
>         /storage_ssd/6aa805bd22e911e4b470047d7bb28f4a/disk
> """
>
> The pool looks ok, if I understand correctly, but we have a slowdown 
> in Xen VM's, that are using these disks via iSCSI. So can please 
> anybody explain what exactly that mean?
The OS retries the read and/or write operation and you notice a slowdown.
>
> 1. Am I right that we have a hardware failure that lead to data 
> corruption?
Yes.
> If so, how to identify failed disk(s) 
The disks containing gpt/drive-07, the disk with gpt/drive-08 and the 
disk with gpt/drive-09.  With smartctl you can read the smart status of 
the disks for more info. I use smartd  with HDDs and SSDs and it, 
usually, warns me about a failing disk before zfs.
> and how it is possible that data is corrupted on zfs mirror?
If in both disks the sectors with the same data are damaged.
> Is there anything I can do to recover except restoring from backup?
Probably no, but you can check the iSCSI disk in the Xen VM if it is 
usable.
>
> 2. What first and second damaged "files" are and why they are shown 
> like that?
ZFS metadata.
>
> I have this in /var/log/messages, but to me it looks like iSCSI 
> message, that's spring up when accessing damaged files:
>
> """
> kernel: (1:32:0/28): WRITE command returned errno 122
> """
Probably in /var/log/messages you can read messages like this:
Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): CAM status: 
ATA Status Error
Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): ATA status: 
51 (DRDY SERV ERR), error: 40 (UNC )
Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): RES: 51 40 e8 
0f a6 40 44 00 00 08 00
Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): Error 5, 
Retries exhausted

In this message the /dev/ada3 HDD is failing.

> Manual zpool scrub was tried on this pool to not avail. The pool 
> capacity is only 66% full.
>
> Thanks for any hints in advance.
>
Maurizio