drive selection for disk arrays
David Christensen
dpchrist at holgerdanske.com
Sat Mar 28 00:39:54 UTC 2020
On 2020-03-27 02:45, Polytropon wrote:
> When a drive _reports_ bad sectors, at least in the past
> it was an indication that it already _has_ lots of them.
> The drive's firmware will remap bad sectors to spare
> sectors, so "no error" so far.
If a drive detects an error, my guess is that it will report the error
to the OS; regardless of the outcome of a particular I/O operation (data
read, data written, data lost) or internal actions taken (block marked
bad, block remapped, etc.). It is then up to the OS to decide what to
do next. RAID and/or ZFS offer the means for shielding the application
from I/O and drive failures.
> When errors are being
> reported "upwards" ("read error" or "write error"
> visible to the OS), it's a sign that the disk has run
> out of spare sectors, and the firmware cannot silently
> remap _new_ bad sectors...
>
> Is this still the case with modern drives?
>
> How transparently can ZFS handle drive errors when the
> drives only report the "top results" (i. e., cannot cope
> with bad sectors internally anymore)? Do SMART tools help
> here, for example, by reading certain firmware-provided
> values that indicate how many sectors _actually_ have
> been marked as "bad sector", remapped internally, and
> _not_ reported to the controller / disk I/O subsystem /
> filesystem yet? This should be a good indicator of "will
> fail soon", so a replacement can be done while no data
> loss or other problems appears.
I have been using smartctl(8) occasionally for many years. The "SMART
Attributes Data Structure" report would seem to hold statistics that
should be useful for predicting failures.
This is my SOHO server:
2020-03-27 17:20:00 toor at f3 ~
# freebsd-version ; uname -a
12.1-RELEASE-p2
FreeBSD f3.tracy.holgerdanske.com 12.1-RELEASE-p2 FreeBSD
12.1-RELEASE-p2 GENERIC amd64
This is a data drive:
2020-03-27 17:20:05 toor at f3 ~
# geom disk list ada1
Geom name: ada1
Providers:
1. Name: ada1
Mediasize: 3000592982016 (2.7T)
Sectorsize: 512
Mode: r1w1e3
descr: SEAGATE ST33000650NS
lunid: 5000c5004e7ce23f
ident: <redacted>
rotationrate: 7200
fwsectors: 63
fwheads: 16
2020-03-27 17:20:08 toor at f3 ~
# smartctl -x /dev/ada1 | grep -A 30 'SMART Attributes Data Structure'
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 078 066 044 - 78783152
3 Spin_Up_Time PO---- 092 091 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 20
5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0
7 Seek_Error_Rate POSR-- 066 060 030 - 4532285
9 Power_On_Hours -O--CK 100 100 000 - 612
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 20
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 100 000 - 0
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 051 046 045 - 49 (Min/Max
39/54)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 6
193 Load_Cycle_Count -O--CK 100 100 000 - 20
194 Temperature_Celsius -O---K 049 054 000 - 49 (0 21 0 0 0)
195 Hardware_ECC_Recovered -O-RC- 033 031 000 - 78783152
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
The following attributes look like they may be related to drive failure,
but I do not know the engineering definition of these attributes nor the
engineering definition of the values reported:
Reallocated_Sector_Ct
Seek_Error_Rate
End-to-End_Error
Reported_Uncorrect
Hardware_ECC_Recovered
Offline_Uncorrectable
UDMA_CRC_Error_Count
I do feel the need to implemented automated SMART monitoring, but have
yet to embark on that journey.
David
More information about the freebsd-questions
mailing list