tuning vfs.zfs.vdev.max_pending and solving the issue of ZFS writes choking read IO

Nikolay Denev ndenev at gmail.com
Fri Mar 26 15:40:33 UTC 2010


On Mar 24, 2010, at 7:55 PM, Dan Nelson wrote:

> In the last episode (Mar 24), Bob Friesenhahn said:
>> On Wed, 24 Mar 2010, Dan Naumov wrote:
>>> Has anyone done any extensive testing of the effects of tuning
>>> vfs.zfs.vdev.max_pending on this issue?  Is there some universally
>>> recommended value beyond the default 35?  Anything else I should be
>>> looking at?
>> 
>> The vdev.max_pending value is primarily used to tune for SAN/HW-RAID LUNs
>> and is used to dial down LUN service time (svc_t) values by limiting the
>> number of pending requests.  It is not terribly useful for decreasing
>> stalls due to zfs writes.  In order to reduce the impact of zfs writes,
>> you want to limit the maximum size of a zfs transaction group (TXG).  I
>> don't know what the FreeBSD tunable is for this, but under Solaris it is
>> zfs:zfs_write_limit_override.
> 
> There isn't a sysctl for it by default, but the following patch will enable
> a vfs.zfs.write_limit_override sysctl:
> 
> Index: dsl_pool.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c,v
> retrieving revision 1.4.2.1
> diff -u -p -r1.4.2.1 dsl_pool.c
> --- dsl_pool.c	17 Aug 2009 09:55:58 -0000	1.4.2.1
> +++ dsl_pool.c	11 Mar 2010 08:34:27 -0000
> @@ -47,6 +47,11 @@ uint64_t zfs_write_limit_inflated = 0;
> uint64_t zfs_write_limit_override = 0;
> extern uint64_t zfs_write_limit_min;
> 
> +SYSCTL_DECL(_vfs_zfs);
> +SYSCTL_QUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RW,
> +	&zfs_write_limit_override, 0,
> +	"Force a txg if dirty buffers exceed this value (bytes)");
> +
> kmutex_t zfs_write_limit_lock;
> 
> static pgcnt_t old_physmem = 0;
> 
> 
>> On a large-memory system, a properly working zfs should not saturate 
>> the write channel for more than 5 seconds.  Zfs tries to learn the 
>> write bandwidth so that it can tune the TXG size up to 5 seconds (max) 
>> worth of writes.  If you have both large memory and fast storage, 
>> quite a huge amount of data can be written in 5 seconds.  On my 
>> Solaris system, I found that zfs was quite accurate with its rate 
>> estimation, but it resulted in four gigabytes of data being written 
>> per TXG.
> 
> I had similar problems on a 32GB Solaris server at work.  Note that with
> compression enabled, the entire system pauses while it compresses the
> outgoing block of data.  It's just a fraction of a second, but long enough
> for end-users to complain about bad performance in X sessions.  I had to
> throttle back to a 256MB write limit size to make the stuttering go away
> completely.  It didn't affect write throughput much at all.
> 
> -- 
> 	Dan Nelson
> 	dnelson at allantgroup.com

I had to come up with more or less the same patch and it fixed my problem with writes stalling the IO of the machine.
Probably this has to be commited.

Regards,
Niki Denev



More information about the freebsd-fs mailing list