ZFS ARC and mmap/page cache coherency question

Karl Denninger karl at denninger.net
Tue Jul 5 14:31:12 UTC 2016


On 7/4/2016 22:01, Allan Jude wrote:
> On 2016-07-04 22:46, Karl Denninger wrote:
>>
>>> You keep saying per zvol. Do you mean per vdev? I am under the
>>> impression that no zvol's are involved in the use case this thread is
>>> about.
>> Sorry, per-vdev.  The problem with dmu_tx is that it's system-wide.
>> This is wildly inappropriate for several reasons -- first, it is
>> computed on size-of-RAM with a hard cap (which is stupid on its face)
>> and it entirely insensitive to the performance of the vdev's in
>> question.  Specifically, it is very common for a system to have very
>> fast (e.g. SSD) disks, perhaps in a mirror configuration, and then
>> spinning rust in a RaidZ2 config for bulk storage.  Those are very, very
>> different performance wise and they should have wildly different
>> write-back cache sizes.  At present there is exactly one such write-back
>> cache and it's both system-wide and pays exactly zero attention to the
>> throughput of the underlying vdevs it is talking to.
>>
>> This is why you can provoke minute-long stalls on a system with moderate
>> (e.g. 32GB) amounts of RAM if there are spinning rust devices in the
>> configuration.
>>
>>>
>>> Improving the way ZFS frees memory, specifically UMA and the 'kmem
>>> caches' will help a lot as well.
>>>
>> Well, yeah.  But that means you have to police up the size of the UMA
>> .vs. how much is actually in use in the UMA.  What the PR does is get
>> pretty aggressive with that whenever RAM is tight, and before the pager
>> can start playing hell with system performance.
>>
>>> In addition, another patch just went in to allow you to change the
>>> arc_max and arc_min on a running system.
>>>
>> Yes, the PR I did a long time ago made that "active" on a running
>> system.... so I've had that for quite some time.  Not that you really
>> ought to need to play with that (if you feel a need to then you're still
>> at step 1 or 2 of what I went through with analyzing and working on this
>> in the 10.x code.....)
>>
>
> Have you looked into the the ZFS 'Write Throttle', it seems like it
> was meant to solve the writeback problem you are describing. It starts
> sending back pressure up to the application by introducing larger and
> larger delays in the write() call until your disks can keep up with
> your applications.
>
> http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/
>
> http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/
>

I believe this has been brought into FreeBSD's implementation; I recall
going through it.

-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2996 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20160705/80722d33/attachment.bin>


More information about the freebsd-hackers mailing list