"zfs receive" lock time
Pawel Jakub Dawidek
pjd at FreeBSD.org
Fri Dec 4 20:21:43 UTC 2009
On Wed, Dec 02, 2009 at 02:55:23PM -0600, Kevin wrote:
>
> I have two very very fast systems (12-disk 15krpm raid array, 16
> cores, etc). I'm using zfs send/receive to replicate a zfs volume from
> the "master" box to the "slave" box.
>
> Every minute, the master takes a new snapshot, then uses "send -i" to
> send an incremental snapshot to the slave. Normally, no files are
> changed during the minute so the operation is very fast (<1 second,
> and most of that is ssh negotiation time).
>
> If the slave is completely idle, "zfs receive" takes a fraction of a
> second. If the slave has been very busy (lots of read activity, no
> writes - the slave has everything mounted read only), suddenly "zfs
> receive" can take 30 seconds or more to complete, the whole time it
> has the filesystem locked. For example, I'd see:
>
> 49345 root 1 76 0 13600K 1956K zio->i 9 0:01 1.37% zfs
> 48910 www 1 46 0 36700K 21932K rrl->r 3 0:24 0.00%
> lighttpd
> 48913 www 1 46 0 41820K 26108K rrl->r 2 0:24 0.00%
> lighttpd
> 48912 www 1 46 0 37724K 23484K rrl->r 0 0:24 0.00%
> lighttpd
> 48911 www 1 46 0 41820K 26460K rrl->r 10 0:23 0.00%
> lighttpd
> 48909 www 1 46 0 39772K 24488K rrl->r 5 0:22 0.00%
> lighttpd
> 48908 www 1 46 0 36700K 21460K rrl->r 14 0:19 0.00%
> lighttpd
> 48907 www 1 45 0 30556K 16216K rrl->r 13 0:14 0.00%
> lighttpd
> 48906 www 1 44 0 26460K 11452K rrl->r 6 0:06 0.00%
> lighttpd
>
> At first, I thought it was possibly cache pressure... when the system
> was busy, whatever data necessary to create a new snapshot was getting
> pushed out of the cache so it had to be re-read. I increased arc_max
> and arc_meta_limit to very high values, and it seemed to have no
> effect, even when arc_meta_used was far below arc_meta_limit.
>
> Disabling cache flushes had no impact. Disabling zil cut the time in
> half, but it's still too long for this application.
>
> ktrace on the "zfs receive" shows:
>
> 1062 zfs 0.000024 CALL ioctl(0x3,0xcc285a11 ,0x7fffffffa320)
> 1062 zfs 0.000081 RET ioctl 0
> 1062 zfs 0.000058 CALL ioctl(0x3,0xcc285a05 ,0x7fffffffa2f0)
> 1062 zfs 0.000037 RET ioctl 0
> 1062 zfs 0.000019 CALL ioctl(0x3,0xcc285a11 ,0x7fffffffa320)
> 1062 zfs 0.000055 RET ioctl 0
> 1062 zfs 0.000031 CALL ioctl(0x3,0xcc285a11 ,0x7fffffff9f00)
> 1062 zfs 0.000053 RET ioctl 0
> 1062 zfs 0.000020 CALL ioctl(0x3,0xcc285a1c ,0x7fffffffc930)
> 1062 zfs 24.837084 RET ioctl 0
> 1062 zfs 0.000028 CALL ioctl(0x3,0xcc285a11 ,0x7fffffff9f00)
> 1062 zfs 0.000074 RET ioctl 0
> 1062 zfs 0.000037 CALL close(0x6)
> 1062 zfs 0.000006 RET close 0
> 1062 zfs 0.000007 CALL close(0x3)
> 1062 zfs 0.000005 RET close 0
>
> The 24 second call to 0xcc285a1c is ZFS_IOC_RECV, so whatever is going
> on is in the kernel, not a delay in getting the kernel any data.
> "systat" is showing that the drives are 100% busy during the
> operation, so it's obviously doing something. :)
>
> Does anyone know what "zfs receive" is doing while it has everything
> locked like this, and why a lot of read activity beforehand would
> drastically effect the performance of doing this?
Read activity is related to the dataset on the slave that is being
received? Is that right?
There are two operations that can suspend you file system this way:
rollback and receive. The suspend is done by acquiring write lock for
the given file system where every other operation acquires read lock.
In the end receive to acquire write lock has to wait for all read
operations to finish.
I'm not sure how your applications use it, but if files are open for
short period of time only and then closed, you could do something like
this:
master# curtime=`date "+%Y%m%d%H%M%S"`
master# zfs snapshot pool/fs@${curtime}
master# zfs send -i pool/fs@${oldtime} pool/fs@${curtime} | \
ssh slave zfs recv pool/fs
slave# zfs clone pool/fs@${curtime} pool/fs_${curtime}
slave# ln -fs /pool/fs_${curtime} /pool/usethis
Then point your application to use directory /pool/usethis/ (clone,
instead of received file system). And clean up clones as you wish.
Read activity on clones shouldn't affect received file system.
--
Pawel Jakub Dawidek http://www.wheel.pl
pjd at FreeBSD.org http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20091204/e208b403/attachment.pgp
More information about the freebsd-fs
mailing list