Regarding regular zfs

Damien Fleuriot ml at my.gd
Fri Apr 5 11:08:15 UTC 2013


On 5 Apr 2013, at 12:17, Joar Jegleim <joar.jegleim at gmail.com> wrote:

> Hi FreeBSD !
> 
> I've already sent this one to questions at freebsd.org, but realised this list
> would be a better option.
> 
> So I've got this setup where we have a storage server delivering about
> 2 million jpeg's as a backend for a website ( it's ~1TB of data)
> The storage server is running zfs and every 15 minutes it does a zfs
> send to a 'slave', and our proxy will fail over to the slave if the
> main storage server goes down .
> I've got this script that initially zfs send's a whole zfs volume, and
> for every send after that only sends the diff . So after the initial zfs
> send, the diff's usually take less than a minute to send over.
> 
> I've had increasing problems on the 'slave', it seem to grind to a
> halt for anything between 5-20 seconds after every zfs receive . Everything
> on the server halts / hangs completely.
> 
> I've had a couple go's on trying to solve / figure out what's
> happening without luck, and this 3rd time I've invested even more time
> on the problem .
> 
> To sum it up:
> -Server was initially on 8.2-RELEASE
> -I've set some sysctl variables such as:
> 
> # 16GB arc_max ( server got 30GB of ram, but had a couple 'freeze'
> situations, suspect zfs.arc ate too much memory)
> vfs.zfs.arc_max=17179869184
> 
> # 8.2 default to 30 here, setting it to 5 which is default from 8.3 and
> onwards
> vfs.zfs.txg.timeout="5"
> 
> # Set TXG write limit to a lower threshold.  This helps "level out"
> # the throughput rate (see "zpool iostat").  A value of 256MB works well
> # for systems with 4 GB of RAM, while 1 GB works well for us w/ 8 GB on
> # disks which have 64 MB cache. <<BR>>
> # NOTE: in <v28, this tunable is called 'vfs.zfs.txg.write_limit_override'.
> #vfs.zfs.txg.write_limit_override=1073741824 # for 8.2
> vfs.zfs.write_limit_override=1073741824 # for 8.3 and above
> 
> -I've implemented mbuffer for the zfs send / receive operations. With
> mbuffer the sync went a lot faster, but still got the same symptoms
> when the zfs receive is done, the hang / unresponsiveness returns for
> 5-20 seconds
> -I've upgraded to 8.3-RELEASE ( + zpool upgrade and zfs upgrade to
> V28), same symptoms
> -I've upgraded to 9.1-RELEASE, still same symptoms
> 
> The period where the server is unresponsive after a zfs receive, I
> suspected it would correlate with the amount of data being sent, but
> even if there is only a couple MB's data the hang / unresponsiveness
> is still substantial .
> 
> I suspect it may have something to do with the zfs volume being sent
> is mount'ed on the slave, and I'm also doing the backups from the
> slave, which means a lot of the time the backup server is rsyncing the
> zfs volume being updated.
> I've noticed that the unresponsiveness / hang situations occur while
> the backupserver is rsync'ing from the zfs volume being updated, when
> the backupserver is 'done' and nothing is working with files in the
> zfs volume being updated i hardly notice any of the symptoms (mabye
> just a minor lag for much less than a second, hardly noticeable) .
> 
> So my question(s) to the list would be:
> In my setup have I taken the use case for zfs send / receive too far
> (?) as in, it's not meant for this kind of syncing and this often, so
> there's actually nothing 'wrong'.
> 
> -- 
> ----------------------
> Joar Jegleim
> 

Quick and dirty reply, what's your pool usage % ?

>75-80% an performance takes a dive.

Let's just make sure you're not there yet.


More information about the freebsd-fs mailing list