zpool errors

Thu Jul 11 07:39:45 UTC 2019

> On 10 Jul 2019, at 20:23, Allan Jude <allanjude at freebsd.org> wrote:
> 
> On 2019-07-10 11:37, Daniel Braniss wrote:
>> 
>> 
>>> On 10 Jul 2019, at 18:24, Allan Jude <allanjude at freebsd.org> wrote:
>>> 
>>> On 2019-07-10 10:48, Daniel Braniss wrote:
>>>> hi,
>>>> i got a degraded pool, but can’t make sense  of the file name:
>>>> 
>>>> protonew-2# zpool status -vx
>>>> pool: h
>>>> state: ONLINE
>>>> status: One or more devices has experienced an error resulting in data
>>>>      corruption.  Applications may be affected.
>>>> action: Restore the file in question if possible.  Otherwise restore the
>>>>      entire pool from backup.
>>>> see: http://illumos.org/msg/ZFS-8000-8A <http://illumos.org/msg/ZFS-8000-8A>
>>>> scan: scrub repaired 6.50K in 17h30m with 0 errors on Wed Jul 10 12:06:14 2019
>>>> config:
>>>> 
>>>>      NAME          STATE     READ WRITE CKSUM
>>>>      h             ONLINE       0     0 14.4M
>>>>        gpt/r5/zfs  ONLINE       0     0 57.5M
>>>> 
>>>> errors: Permanent errors have been detected in the following files:
>>>> 
>>>>      <0x102>:<0x30723>
>>>>      <0x102>:<0x30726>
>>>>      <0x102>:<0x3062a>
>>>> …
>>>>      <0x281>:<0x0>
>>>>      <0x6aa>:<0x305cd>
>>>>      <0xffffffffffffffff>:<0x305cd>
>>>> 
>>>> 
>>>> any hints as how I can identify third files?
>>>> 
>>>> thanks,
>>>> 	danny
>>>> 
>>>> _______________________________________________
>>>> freebsd-hackers at freebsd.org mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>>>> 
>>> 
>>> Once a file has been deleted, ZFS can have a hard time determining its
>>> filename.
>>> 
>>> It is inode 198186 (0x3062a) on dataset 0x102. The file has been
>>> deleted, but still exists in at least one snapshot.
>>> 
>>> Although, 57 million checksum errors seems like there may be some other
>>> problem. You might look for and resolve the problem with what appears to
>>> be a raid5 you have built your ZFS pool on top of it? Then do 'zpool
>>> clear' to reset the counters to zero, and 'zpool scrub' to try to read
>>> everything again.
>>> 
>>> -- 
>>> Allan Jude
>>> 
>> I don’t know when the first error was detected, and this host has been up for 367 days!
>> I did a scrub but no change.
>> i will remove old snapshots and see if it helps.
>> 
>> is it possible to know at least which volume?
>> 
>> thanks,
>> 	danny
>> 
>> 
>> _______________________________________________
>> freebsd-hackers at freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>> 
> 
> zdb -ddddd h 0x102
> 
> Should tell you about which dataset that is
> 
> -- 
> Allan Jude
> 

firstly, thanks for your help!
now, after doing a zpool clear, I notice that the CHKSUM is growing,
the pool is on a raid controller raid5 (PERC from dell) which is showing
it’s correcting the errors (‘Corrected medium error during recovery on PD …).

so what can be  the cause? btw, the FreeBSD is 10.3-stable.