drive selection for disk arrays

Fri Mar 27 16:04:32 UTC 2020

Hello!

I may be an idiot and total ignorant obscurantist knowing nothing about 
the modern computers but:

1) If a bad sector appears then it's a big chance that data are 
irreversibly lost. The drive could remap the sector but it cannot bring 
back the data.

2) The RAID5 controller usually has a patrol read so every single bad 
sector will be found and remapped with data being restored. At least 
"mfi" controllers I use surely do this.

3) It basically means that almost every single bad sector shall be 
declared by driver in system log, with exception the ones that were 
successfully restored and remapped by the drive itself.

And I don't trust software RAIDs.

Thor

On 03/27/20 17:45, Polytropon wrote:
> On Thu, 26 Mar 2020 16:37:58 -0400 (EDT), Daniel Feenberg wrote:
>> The disturbing frequency of multiple drives going offline in quick
>> succession is, in my view, largely a result of defects being discovered in
>> quick succession, rather than occuring in quick succession. If a defect
>> occurs in a sector that is rarely visited it can remain hidden for a long
>> time. During a resilver that defect will be noticed and the drive failed
>> out. I do think that is an overly aggressive action by the resilvering
>> process, as that may be the only bad sector, it may be possible to recover
>> all the data from the remaining drives (if the first failing drive can
>> read the appropriate sector), and that sector may not even be in an active
>> file.
> I'd like to mention something in this context:
>
> When a drive _reports_ bad sectors, at least in the past
> it was an indication that it already _has_ lots of them.
> The drive's firmware will remap bad sectors to spare
> sectors, so "no error" so far. When errors are being
> reported "upwards" ("read error" or "write error"
> visible to the OS), it's a sign that the disk has run
> out of spare sectors, and the firmware cannot silently
> remap _new_ bad sectors...
>
> Is this still the case with modern drives?
>
> How transparently can ZFS handle drive errors when the
> drives only report the "top results" (i. e., cannot cope
> with bad sectors internally anymore)? Do SMART tools help
> here, for example, by reading certain firmware-provided
> values that indicate how many sectors _actually_ have
> been marked as "bad sector", remapped internally, and
> _not_ reported to the controller / disk I/O subsystem /
> filesystem yet? This should be a good indicator of "will
> fail soon", so a replacement can be done while no data
> loss or other problems appears.
>
>
>
>
>