zfs very poor performance compared to ufs due to lack of cache?

Steven Hartland killing at multiplay.co.uk
Wed Sep 15 13:42:29 UTC 2010


----- Original Message ----- 
From: "Andriy Gapon" <avg at freebsd.org>
> on 15/09/2010 13:32 Steven Hartland said the following:
>> === conclusion ===
>> The interaction of zfs and sendfile is causing large amounts of memory
>> to end up in the inactive pool and only the use of a hard min arc limit is
>> ensuring that zfs forces the vm to release said memory so that it can be
>> used by zfs arc.
> 
> Memory ends up as inactive because of how sendfile works.  It first pulls data
> into a page cache as active pages.  After pages are not used for a while, they
> become inactive.  Pagedaemon can further recycle inactive pages, but only if
> there is any shortage.  In your situation there is no shortage, so pages just
> stay there, but are ready to be reclaimed (or re-activated) at any moment.
> They are not a waste!  Just a form of a cache.

That doesnt seem to explain why without setting a min arc cache the io to disk
went nuts even though only a few files where being requested.

This however was prior to the upgrade to stable and all patches so I think I need
remove the configured min for arc from loader and retest with the current code
base to confirm this is still an issue.

> If ARC size doesn't grow in that condition, then it means that ZFS simply
> doesn't need it to.

So what your saying is that even with zero arc there should be no IO required
as it should come direct from inactive pages? Another reason to retest with no
hard coded arc settings.

> General problem of double-caching with ZFS still remains and will remain and
> nobody promised to fix that.
> I.e. with sendfile (or mmap) you will end up with two copies of data, one in
> page cache and the other in ARC.  That happens on Solaris too, no magic.

Obviously this is quite an issue as a 1GB source file will require 2GB of memory
to stream hence totally outweighing any benefit of the zero copy sendfile offers?

> The things I am trying to fix are:
> 1. Interaction between ARC and the rest of VM during page shortage; you don't
> seem to have much of that, so you don't see it.  Besides, your range for ARC
> size is quite narrow and your workload is so peculiar that your setup is not the
> best one for testing this.

Indeed we have no other memory pressures, but holding two copies of the data is
an issue. This doesn't seem to be the case in ufs so where's the difference?

> 2. Copying of data from ARC to page cache each time the same data is served by
> sendfile.  You won't see much changes without monitoring ARC hits as Wiktor has
> suggested.  In bad case there would be many hits because the same data is
> constantly copied from ARC to page cache (and that simply kills any benefit
> sendfile may have).  In good case there would be much less hits, because data is
> not copied, but is served directly from page cache.

Indeed. Where would this need to be addressed as ufs doesn't suffer from this?

>> The source data, xls's and exported graphs can be found here:-
>> http://www.multiplaygameservers.com/dropzone/zfs-sendfile-results.zip
> 
> So, what problem, performance or otherwise, do you perceive with your system's
> behavior?  Because I don't see any.

The initial problem was that with a default config, ie no hard coded min or max on arc
the machine very quickly becomes seriously IO bottlenecked which simply doesn't
happen on ufs.

Now we have a very simple setup so we can make sensible values for min / max but
it still means that for every file being sent when sendfile is enabled:
1. There are two copies in memory which is still going to mean that only half the
amount files can be successfully cached and served without resorting to disk IO.

2. sendfile isn't achieving what it states it should be i.e. a zero-copy. Does this explain
the other odd behaviour we noticed, high CPU usage from nginx?

> To summarize:
> 1. With sendfile enabled you will have two copies of actively served data in
> RAM, but perhaps slightly faster performance, because of avoiding another copy
> to mbuf in sendfile(2).
> 2. With sendfile disabled, you will have one copy of actively served data in RAM
> (in ARC), but perhaps slightly slower performance because of a need to make a
> copy to mbuf.
> 
> Which would serve you better depends on size of your hot data vs RAM size, and
> on actual benefit from avoiding the copying to mbuf.  I have never measured the
> latter, so I don't have any real numbers.
> From your graphs it seems that your hot data (multiplied by two) is larger than
> what your RAM can accommodate, so you should benefit from disabling sendfile.

This is what I thought, memory pressure has been eased from the initial problem point
due to a memory increase from 4 - 7GB in the machine in question, but it seems at
this point both 1 and 2 are far from ideal situations both having fairly serious side effects
on memory use / bandwidth and possibly CPU, especially as hot data vs. clients is never
going to be static ratio and hence both are going to fall down at some point :(

I suspect this is going to be effecting quite a few users with nginx and others that use
sendfile for high performance file transmission becoming more and more popular as is
zfs.

So the question is how do we remove these unexpected bottlenecks and make zfs as
efficient as ufs when sendfile is used?

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.



More information about the freebsd-fs mailing list