Regarding regular zfs

Joar Jegleim joar.jegleim at gmail.com
Fri Apr 5 10:17:29 UTC 2013


Hi FreeBSD !

I've already sent this one to questions at freebsd.org, but realised this list
would be a better option.

So I've got this setup where we have a storage server delivering about
2 million jpeg's as a backend for a website ( it's ~1TB of data)
The storage server is running zfs and every 15 minutes it does a zfs
send to a 'slave', and our proxy will fail over to the slave if the
main storage server goes down .
I've got this script that initially zfs send's a whole zfs volume, and
for every send after that only sends the diff . So after the initial zfs
send, the diff's usually take less than a minute to send over.

I've had increasing problems on the 'slave', it seem to grind to a
halt for anything between 5-20 seconds after every zfs receive . Everything
on the server halts / hangs completely.

I've had a couple go's on trying to solve / figure out what's
happening without luck, and this 3rd time I've invested even more time
on the problem .

To sum it up:
-Server was initially on 8.2-RELEASE
-I've set some sysctl variables such as:

# 16GB arc_max ( server got 30GB of ram, but had a couple 'freeze'
situations, suspect zfs.arc ate too much memory)
vfs.zfs.arc_max=17179869184

# 8.2 default to 30 here, setting it to 5 which is default from 8.3 and
onwards
vfs.zfs.txg.timeout="5"

# Set TXG write limit to a lower threshold.  This helps "level out"
# the throughput rate (see "zpool iostat").  A value of 256MB works well
# for systems with 4 GB of RAM, while 1 GB works well for us w/ 8 GB on
# disks which have 64 MB cache. <<BR>>
# NOTE: in <v28, this tunable is called 'vfs.zfs.txg.write_limit_override'.
#vfs.zfs.txg.write_limit_override=1073741824 # for 8.2
vfs.zfs.write_limit_override=1073741824 # for 8.3 and above

-I've implemented mbuffer for the zfs send / receive operations. With
mbuffer the sync went a lot faster, but still got the same symptoms
when the zfs receive is done, the hang / unresponsiveness returns for
5-20 seconds
-I've upgraded to 8.3-RELEASE ( + zpool upgrade and zfs upgrade to
V28), same symptoms
-I've upgraded to 9.1-RELEASE, still same symptoms

The period where the server is unresponsive after a zfs receive, I
suspected it would correlate with the amount of data being sent, but
even if there is only a couple MB's data the hang / unresponsiveness
is still substantial .

I suspect it may have something to do with the zfs volume being sent
is mount'ed on the slave, and I'm also doing the backups from the
slave, which means a lot of the time the backup server is rsyncing the
zfs volume being updated.
I've noticed that the unresponsiveness / hang situations occur while
the backupserver is rsync'ing from the zfs volume being updated, when
the backupserver is 'done' and nothing is working with files in the
zfs volume being updated i hardly notice any of the symptoms (mabye
just a minor lag for much less than a second, hardly noticeable) .

So my question(s) to the list would be:
In my setup have I taken the use case for zfs send / receive too far
(?) as in, it's not meant for this kind of syncing and this often, so
there's actually nothing 'wrong'.

-- 
----------------------
Joar Jegleim
Homepage: http://cosmicb.no
Linkedin: http://no.linkedin.com/in/joarjegleim
fb: http://www.facebook.com/joar.jegleim
AKA: CosmicB @Freenode

----------------------


More information about the freebsd-fs mailing list