Re: ZFS + mysql appears to be killing my SSD's

From: Karl Denninger <karl_at_denninger.net>
Date: Mon, 05 Jul 2021 15:09:37 UTC
On 7/5/2021 10:30, Pete French wrote:
>
>
> On 05/07/2021 14:37, Stefan Esser wrote:
>> Hi Pete,
>>
>> have you checked the drive state and statistics with smartctl?
>
> Hi, thanks for the reply - yes, I did check the statistics, and they 
> dont make a lot of sense. I was just looking at them again in fact.
>
> So, one of the machines that we chnaged a drive on when this first 
> started, which was 4 weeks ago.
>
> root@telehouse04:/home/webadmin # smartctl -a /dev/ada0 | grep Perc
> 169 Remaining_Lifetime_Perc 0x0000   082   082   000    Old_age 
> Offline      -       82
> root@telehouse04:/home/webadmin # smartctl -a /dev/ada1 | grep Perc
> 202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age 
> Offline      -       0
>
> Now, from that you might think the 2nd drive was the one changes, but 
> no. Its the first one, which is now at 82% lifetime remaining! The 
> other druve, still at 100%, has been in there a year. The drives are 
> different manufacturers, which makes comparing most of the numbers 
> tricky unfortunately.
>
>
> Am now even more worried than when I sent the first email - if that 
> 18% is accurate then I am going to be doing this again in another 4 
> months, and thats not sustainable. It also looks as if this problem 
> has got a lot worse recently. Though I wasnt looking at the numbers 
> before, only noticing tyhe failurses. If I look at 'Percentage Used 
> Endurance Indicator' isntead of the 'Percent_Lifetime_Remain' value 
> then I see some of those well over 200%. That value is, on the newer 
> drives, 100 minus the 'Percent_Lifetime_Remain' value, so I guess they 
> ahve the same underlying metric.
>
> I didnt mention in my original email, but I am encrypting these with 
> geli. Does geli do any write amplification at all ? That might explain 
> the high write volumes...
>
> -pete.
>
As noted elsewhere assuming ashift=12 the answer on write amplification 
is no.

Geli should be initialized with -s 4096; I'm assuming you did that?

I have a 5-unit geli-encrypted root pool, all Intel 240gb SSDs. They do 
not report remaining lifetime via smart but do report indications of 
trouble.  Here's one example snippet from one of the drives in that pool:

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
   5 Reallocated_Sector_Ct   -O--CK   098   098   000    -    0
   9 Power_On_Hours          -O--CK   100   100   000    - 53264
  12 Power_Cycle_Count       -O--CK   100   100   000    -    100
170 Available_Reservd_Space PO--CK   100   100   010    -    0
171 Program_Fail_Count      -O--CK   100   100   000    -    0
172 Erase_Fail_Count        -O--CK   100   100   000    -    0
174 Unsafe_Shutdown_Count   -O--CK   100   100   000    -    41
175 Power_Loss_Cap_Test     PO--CK   100   100   010    -    631 (295 5442)
183 SATA_Downshift_Count    -O--CK   100   100   000    -    0
184 End-to-End_Error        PO--CK   100   100   090    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
190 Temperature_Case        -O---K   068   063   000    -    32 (Min/Max 
29/37)
192 Unsafe_Shutdown_Count   -O--CK   100   100   000    -    41
194 Temperature_Internal    -O---K   100   100   000    -    32
197 Current_Pending_Sector  -O--CK   100   100   000    -    0
199 CRC_Error_Count         -OSRCK   100   100   000    -    0
225 Host_Writes_32MiB       -O--CK   100   100   000    - 1811548
226 Workld_Media_Wear_Indic -O--CK   100   100   000    -    205
227 Workld_Host_Reads_Perc  -O--CK   100   100   000    -    49
228 Workload_Minutes        -O--CK   100   100   000    - 55841
232 Available_Reservd_Space PO--CK   100   100   010    -    0
233 Media_Wearout_Indicator -O--CK   089   089   000    -    0
234 Thermal_Throttle        -O--CK   100   100   000    -    0/0
241 Host_Writes_32MiB       -O--CK   100   100   000    - 1811548
242 Host_Reads_32MiB        -O--CK   100   100   000    - 1423217
                             ||||||_ K auto-keep
                             |||||__ C event count
                             ||||___ R error rate
                             |||____ S speed/performance
                             ||_____ O updated online
                             |______ P prefailure warning


Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 2) ==
0x01  0x008  4             100  ---  Lifetime Power-On Resets
0x01  0x018  6    118722148102  ---  Logical Sectors Written
0x01  0x020  6        89033895  ---  Number of Write Commands
0x01  0x028  6     93271951909  ---  Logical Sectors Read
0x01  0x030  6         6797990  ---  Number of Read Commands

6 years in-use, roughly, and no indication of anything going on in terms 
of warnings about utilization or wear-out.  There is a MYSQL database on 
this box used by Cacti that is running all the time and while the 
traffic is real high, it's there (there is also a Postgres server 
running on there that sees some traffic too.)  These specific drives 
were selected due to having power-fail protection for data in-flight -- 
they were one of only a few that I've tested which passed a "pull the 
cord" test even though they're actually the 730s, NOT the "DC" series.

Raidz2 configuration:

root@NewFS:/home/karl # zpool status zsr
   pool: zsr
  state: ONLINE
   scan: scrub repaired 0 in 0 days 00:07:05 with 0 errors on Mon Jun 28 
03:43:58 2021
config:

         NAME            STATE     READ WRITE CKSUM
         zsr             ONLINE       0     0     0
           raidz2-0      ONLINE       0     0     0
             ada0p4.eli  ONLINE       0     0     0
             ada1p4.eli  ONLINE       0     0     0
             ada2p4.eli  ONLINE       0     0     0
             ada3p4.eli  ONLINE       0     0     0
             ada4p4.eli  ONLINE       0     0     0

errors: No known data errors

Micron appears to be the only people making suitable replacements if and 
when these do start to fail on me, but from what I see here it will be a 
good while yet.

--
-- 
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/