Re: ZFS + mysql appears to be killing my SSD's

From: Stefan Esser <se_at_freebsd.org>
Date: Mon, 05 Jul 2021 13:37:09 UTC
Am 05.07.21 um 15:15 schrieb Pete French:
> I hve a netwkr of FreeBSD machines which are running mysql on top of zfs. I
> have been doing this for a while, but a couple of years ago we switched to
> using SSD. After less than a year (I dont remember the exact timings), they all
> strated to fail. We assumed a bad batch, and had them replaced, and didnt think
> anything more of it.
> 
> A week or so, all the replacements started to fail. This was shortly after I
> upgraded to FreeBSD 13 and OpenZFS, but I think this is unrelated, however its
> one major chnage which happened before the most recent round of failures.
> 
> The thing is though, that I am not seieng any heavy activity on the drives. The
> load is sustained, but well below the lifetime write thresh-hold for the drive.
> I also do not see the drives a being heavily in use when I run gstat. So its
> perplexing. I am assuming its related to the mysql load, as this is identical
> across all machines, and they are all dying within a few days of each other.
> 
> Any insights would be appreciated... :-)

Hi Pete,

have you checked the drive state and statistics with smartctl?

This is the output that I get from my SSD after use as a L2ARC for 1 year:

$ smartctl -d nvme /dev/nvme0 -a
...
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        27 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    1%
Data Units Read:                    11,745,658 [6.01 TB]
Data Units Written:                 14,767,823 [7.56 TB]
Host Read Commands:                 522,309,835
Host Write Commands:                69,368,834
Controller Busy Time:               1,198
Power Cycles:                       40
Power On Hours:                     8,514
Unsafe Shutdowns:                   28
Media and Data Integrity Errors:    0
Error Information Log Entries:      120
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 63 entries)
No Errors Logged

That drive has a spec of 600 TB TBW and I seem to have used 1% of that within
that year of use.

Regards, STefan