ZFS dedup write pathway - possible inefficiency or..?

Andriy Gapon avg at FreeBSD.org
Thu Apr 5 19:56:24 UTC 2018

On 05/04/2018 20:11, Stilez Stilezy wrote:
> That does suck, but it's how bugs are.  I'm using 10G and dedup, so even with
> good hardware the test case is definitely piling on the incoming data and
> workload during writes, probably as fast as the OS can take it, or a little
> more. Without dedup it flies, though. Hard to disentangle what's dedup overhead
> and what's dedup write bug, though, or to get an idea which is responsible for
> how much of the problem.
> As I don't know Illumos/OpenZFS's track record with bugs, is this likely to get
> attention/resolved "at some time this year" or is it a "who knows, bugs take
> forever, could be still here in 5 years time"? Is it helpful if I nudge and
> offer a "causing a problem" note on their bug tracker? Perhaps it's worth it in
> case it gives extra data? What do you reckon?

I don't know.  At this time it seems that there is not much interest in the
issue.  But maybe tomorrow someone will get excited and fix the problem in an
hour.  Or maybe that will happen in 5 years.

> Last, if I have high data rates but want to minimise the dirty data issue, wcan
> you suggest broadly, how to customise the dirty data/caching sysctls/loaders, to
> try and mitigate the impact (get best possible handling without losing 99% of
> throughput or sky-high latency?).

Minimizing maximum dirty data threshold has its flip side that also affects
performance.  I am far from sure that tuning that can actually help.
If you want to experiment, you can start with
$ sysctl -d vfs.zfs | fgrep -i dirty

> I understand from your reply that there's no recipes in debugging, but any
> suggestions at all which way to try for at least some mitigation, or which
> values might be worth experimenting with, to reduce the effect of the problem
> pathway?

I don't know of any mitigation.
But I want to note that typically "I want to use dedup" and "I want to be able
to write as fast as possible" arise for very different use cases and both are
rarely needed at the same time (except for benchmarks, synthetic tests, etc).
If one needs streaming writes then usually there is not much to dedup.

Andriy Gapon

More information about the freebsd-fs mailing list