Re: ZFS + mysql appears to be killing my SSD's

From: Warner Losh <imp_at_bsdimp.com>
Date: Wed, 7 Jul 2021 11:05:12 -0600
On Wed, Jul 7, 2021 at 4:06 AM Pete French <petefrench_at_ingresso.co.uk>
wrote:

>
>
> On 07/07/2021 00:01, Pete Wright wrote:
>
> > i also wonder if this could be a TRIM related issue.  according to
> > zpoolprops(8) TRIM is not enabled by default on pools - but it also has
> > this note (see autotrim):
> >
> > "Be aware that automatic trimming of recently freed data blocks
> > can put significant stress on the underlying storage devices.
> > This will vary depending of how well the specific device handles
> > these commands.  For lower end devices it is often possible to
> > achieve most of the benefits of automatic trimming by running an
> > on-demand (manual) TRIM periodically using the zpool trim
> > command."
> >
> >
> > I wonder if adjusting the autotrim feature will address these issues - i
> > manually enable autotrim for my pools and have seen no bad effects under
> > quite of bit of bursty i/o.  if it is enabled i wonder if the ssd you
> > are using doesn't play nice with autotrim and should stay disabled?
>
> I was thinking this too - the autotrim stuff came in with OpenZFS, but I
> had trim enabled previously (I believe we had our own implementation on
> FreeBSD ZFS, is that right?). But I am wondering if a schduled trim
> might be a better option. Though the question arises as to 'how often'
> in that case.
>

Two observations about this whole thread.

First: the cheapest SSDs have a much lower DWPD (drive writes per day)
rating
now than they used to have. A few years ago, 3DWPD was normal, now it's
closer
to 0.3DWPD. If your drives are wearing out faster now than before, this may
well
be why. If you have a large write workload, you'll almost certainly need to
buy
more expensive drives with higher DWPD ratings. In general, the cost
difference
between the drives is small enough that over-provisioning this by a factor
of
about 10-20  (meaning if you think you only need 0.3DWPD, buy a 3DWPD
drive), especially for smaller deployments where your time to replace the
bad
drives will easily exceed the delta in cost up front.

Second, TRIM should help reduce write amplification that happens inside the
drive. Ideally, it should be done inline. However, that's not always
performant
due to the quality of TRIM implementations on some drives. That's why the
automated
periodic TRIM features were added. TRIMs are most effective when you have
lots
of data that's written and shortly after becomes idle for a long time. If
it's written and
then is rewritten quickly after the blocks are released, then TRIMs have
much less
effect on the write amp than if there's a period of time that elapses.

Which brings us to how often. That's tricky. A lot of that depends on the
performance
impact the TRIMs have on the on-going operations and expected lifetime of
the
recently written "cold" data (data that will stick around for a while).
Daily is likely a
good place to start, ideally at an off time. I don't really know your
workload that well,
so I can't say for sure, but that's a good place to start. TRIMs can only
help the
internal writeamp of the drive's FTL, but won't help the raw write rate.
You'll need
to change your application to reduce writes if you must write in excess of
your drive's
DWPD.

FreeBSD's old ZFS had its own implementation, but that was never ported to
OpenZFS. It was replaced by a better implementation with which I'm not too
familiar.

Warner
Received on Wed Jul 07 2021 - 17:05:12 UTC

Original text of this message