zfs very poor performance compared to ufs due to lack of cache?

Andriy Gapon avg at freebsd.org
Mon Sep 13 15:24:08 UTC 2010


on 13/09/2010 00:01 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg at freebsd.org>
>>
>> All :-)
>> Revision of your code, all the extra patches, workload, graphs of ARC and memory
>> dynamics and that's just for the start.
>> Then, analysis similar to that of Wiktor.  E.g. trying to test with a single
>> file and then removing it, or better yet, examining with DTrace actual code
>> paths taken from sendfile(2).
> 
> All those have been given in past posts on this thread, but that's quite fragmented,
> sorry about that, so here's the current summary for reference:-
> 
> The machine is a stream server with its job being to serve mp4 http streams via
> nginx. It also exports the fs via nfs to an encoding box which does all the grunt
> work of creating the streams, but that doesn't seem relevant here as this was
> not in use during these tests.
> 
> We currently have two such machines one which has been updated to zfs and one
> which is still on ufs. After upgrading to 8.1-RELEASE and zfs all seemed ok until we
> had a bit of a traffic hike at which point we noticed the machine in question really
> struggling even though it was serving less than 100 clients at under 3mbps for
> a few popular streams which should have all easily fitted in cache.
> 
> Upon investigation it seems that zfs wasn't caching anything so all streams where
> being read direct from disk overloading the areca controller backed with a 7 disk
> RAID6 volume.
> 
> After my original post we've done a number of upgrades and we are now currently
> running 8-STABLE as of the 06/09 plus the following
> http://people.freebsd.org/~mm/patches/zfs/v15/stable-8-v15.patch
> http://people.freebsd.org/~mm/patches/zfs/zfs_metaslab_v2.patch
> http://people.freebsd.org/~mm/patches/zfs/zfs_abe_stat_rrwlock.patch
> needfree.patch and vm_paging_needed.patch posted by jhell
> 
>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
>> @@ -500,6 +500,7 @@ again:
>>      sched_unpin();
>>     }
>>     VM_OBJECT_LOCK(obj);
>> +  if (error == 0)
>> +     vm_page_set_validclean(m, off, bytes);
>>     vm_page_wakeup(m);
>>     if (error == 0)
>>      uio->uio_resid -= bytes;


I'd really prefer to see description of your sources as svn revision rXXXXX plus
http link to a diff of your actual sources to that revision.
That would greatly help to see what you actually have, and what you don't have.

> When nginx is active and using sendfile we see a large amount of memory, equivalent
> to the size of the files being accessed it seems, slip into inactive according to
> top and
> the size of arc drop to the at most the minimum configured and some times even less.
> 
> The machine now has 7GB or ram and these are the load.conf settings currently in
> use:-
> # As we have battery backed cache we can do this
> vfs.zfs.cache_flush_disable=1
> vfs.zfs.prefetch_disable=0
> # Physical Memory * 1.5
> vm.kmem_size="11G"
> vfs.zfs.arc_min="5G"
> vfs.zfs.arc_max="6656M"
> vfs.zfs.vdev.cache.size="20M"
> 
> Currently arc_summary reports the following after been idle for several hours:-
> ARC Size:
>        Current Size:                   76.92%  5119.85M (arcsize)
>        Target Size: (Adaptive)         76.92%  5120.00M (c)
>        Min Size (Hard Limit):          76.92%  5120.00M (c_min)
>        Max Size (High Water):          ~1:1    6656.00M (c_max)
> 
> Column details as requested previously:-
> cnt, time, kstat.zfs.misc.arcstats.size, vm.stats.vm.v_pdwakeups,
> vm.stats.vm.v_cache_count, vm.stats.vm.v_inactive_count,
> vm.stats.vm.v_active_count, vm.stats.vm.v_wire_count,
> vm.stats.vm.v_free_count
> 1,1284323760,5368902272,72,49002,156676,27241,1505466,32523
> 2,1284323797,5368675288,73,51593,156193,27612,1504846,30682
> 3,1284323820,5368675288,73,51478,156248,27649,1504874,30671
> 4,1284323851,5368670688,74,22994,184834,27609,1504794,30698
> 5,1284323868,5368670688,74,22990,184838,27605,1504792,30698
> 6,1284324024,5368679992,74,22246,184624,27663,1505177,31171
> 7,1284324057,5368679992,74,22245,184985,27663,1504844,31170
> 
> Point notes:
> 1. Initial values
> 2. single file request size: 692M
> 3. repeat request #2
> 4. request for second file 205M
> 5. repeat request #4
> 6. multi request #2
> 7. complete

Graphs look prettier :-)
I used drraw to visualize rrdtool data.

Well, I don't see anything unusual in these numbers.
E.g. contrary to what you implied by saying that the patch hasn't changed
anything, I do not see page counts changing much after each iteration of sending
the same file.  Also, during the test you seem to have sufficiently high amount of
free and cached pages to not trigger ARC shrinkage or inactive/active recycling.

> top details after tests:-
> Mem: 106M Active, 723M Inact, 5878M Wired, 87M Cache, 726M Buf, 124M Free
> Swap: 4096M Total, 836K Used, 4095M Free
> 
> arc_summary snip after test
> ARC Size:
>        Current Size:                   76.92%  5119.97M (arcsize)
>        Target Size: (Adaptive)         76.92%  5120.09M (c)
>        Min Size (Hard Limit):          76.92%  5120.00M (c_min)
>        Max Size (High Water):          ~1:1    6656.00M (c_max)
> 
> If I turn the box on so it gets a real range of requests, after about an hour I
> see something
> like:-
> Mem: 104M Active, 2778M Inact, 3065M Wired, 20M Cache, 726M Buf, 951M Free
> Swap: 4096M Total, 4096M Free
> 
> ARC Size:
>        Current Size:                   34.37%  2287.36M (arcsize)
>        Target Size: (Adaptive)         100.00% 6656.00M (c)
>        Min Size (Hard Limit):          76.92%  5120.00M (c_min)
>        Max Size (High Water):          ~1:1    6656.00M (c_max)
> 
> As you can see the size of ARC has even dropped below c_min. The results of the
> live test
> where gathered directly after a reboot, in case that's relevant.

Well, I would love to see the mentioned above graphs for this real test load.
Going below c_min likely means that you don't have all the latest stable/8 ZFS
code, but i am not sure.

> If someone could suggest a set of tests that would help I'll be happy to run them but
> from what's been said thus far is seems that the use of sendfile is forcing memory
> use
> other than that coming from arc which is what's expected?
> 
> Would running the same test with sendfile disabled in nginx help?

The more test data the better, we could have some base for comparison and
separation of general ARC issues from sendfile-specific issues.

Thanks!
-- 
Andriy Gapon


More information about the freebsd-fs mailing list