redundant zfs pool, system traps and tonns of corrupted files

Thu Jun 29 12:04:24 UTC 2017

Hi,

On 29.06.2017 16:37, Eugene M. Zheganin wrote:
> Hi.
>
>
> Say I'm having a server that traps more and more often (different 
> panics: zfs panics, GPFs, fatal traps while in kernel mode etc), and 
> then I realize it has tonns of permanent errors on all of it's pools 
> that scrub is unable to heal. Does this situation mean it's a bad 
> memory case ? Unfortunately I switched the hardware to an identical 
> server prior to encountering zpools have errors, so I'm not use when 
> did they appear. Right now I'm about to run a memtest on an old hardware.
>
>
> So, whadda you say - does it point at the memory as the root problem ?
>

I'm also not quite getting the situation when I have errors on a vdev 
level, but 0 errors on a lower device layer (could someone please 
explain this):

   pool: esx
  state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://illumos.org/msg/ZFS-8000-8A
   scan: resilvered 3,74G in 0h5m with 0 errors on Tue Dec 27 05:14:32 2016
config:

         NAME        STATE     READ WRITE CKSUM
         esx         ONLINE       0     0 99,0K
           raidz1-0  ONLINE       0     0  113K
             da0     ONLINE       0     0     0
             da1     ONLINE       0     0     0
             da2     ONLINE       0     0     2
             da3     ONLINE       0     0     0
             da5     ONLINE       0     0     0
           raidz1-1  ONLINE       0     0 84,7K
             da12    ONLINE       0     0     0
             da13    ONLINE       0     0     1
             da14    ONLINE       0     0     0
             da15    ONLINE       0     0     0
             da16    ONLINE       0     0     0

errors: 25 data errors, use '-v' for a list

   pool: gamestop
  state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://illumos.org/msg/ZFS-8000-8A
   scan: scrub in progress since Thu Jun 29 12:30:21 2017
         1,67T scanned out of 4,58T at 1002M/s, 0h50m to go
         0 repaired, 36,44% done
config:

         NAME        STATE     READ WRITE CKSUM
         gamestop    ONLINE       0     0     1
           raidz1-0  ONLINE       0     0     2
             da6     ONLINE       0     0     0
             da7     ONLINE       0     0     0
             da8     ONLINE       0     0     0
             da9     ONLINE       0     0     0
             da11    ONLINE       0     0     0

errors: 10 data errors, use '-v' for a list

P.S. This is a FreeBSD 11.1-BETA2 r320056M (M stands for CTL_MAX_PORTS = 
1024), with ECC memory.

Thanks.
Eugene.