8.1R possible zfs snapshot livelock?

Borja Marcos borjam at sarenet.es
Wed May 18 12:24:14 UTC 2011


On May 17, 2011, at 1:29 PM, Jeremy Chadwick wrote:

> * ZFS send | ssh zfs recv results in ZFS subsystem hanging; 8.1-RELEASE;
>  February 2011:
>  http://lists.freebsd.org/pipermail/freebsd-fs/2011-February/010602.html

I found a reproducible deadlock condition actually. If you keep some I/O activity on a dataset on which you are receiving a ZFS incremental snapshot at the same time, it can deadlock.

Imagine this situation: Two servers, A and B. A dataset on server A is replicated at regular intervals to B, so that you keep a reasonably up to date copy.

Something like:

(Runnning on server A):

zfs snapshot thepool/thedataset at thistime
zfs send -Ri thepooll/thedataser at previoustime hepool/thedataset at thistime | ssh serverB zfs receive -d thepool

It works, but I suffered a deadlock when one of the periodic "daily" scripts was running. Doing some tests, I saw that ZFS  can deadlock if you do a zfs receive onto a dataset which has some read activity. Disabling atime didn't help either.

But if you make sure *not* to access the replicated dataset it works, I haven´t seen it failing otherwise. 

If  you wish to reproduce it, try creating a dataset for /usr/obj, running make buildworld on it, replicating at, say, 30 or 60 second intervals, and keep several scripts (or rsync) reading the target dataset files and just copying them to another place in the usual, "classic" way. (example: tar cf - . | ( cd /destination && tar xf -)





Borja



More information about the freebsd-stable mailing list