ZFS v28 issues with "zfs" command

Wed Aug 3 11:38:42 UTC 2011

Hello,

I've been doing tests with FreeBSD 8-STABLE, cvsupped yesterday. 

First, I haven't been able to reproduce the deadlock I observed several times when receiving a snapshot on a dataset on which there was some reading activity. So this bug seems to be solved.

However, I've seen something worrysome. 

I'm using a small, simple replication program. At given intervals, right now i'm using 20 second intervals, it sends an incremental snapshot to a secondary machine.

The algorithm is this:

(time to replicate a new snapshot)
ssh destination zfs list -t snapshot...
zfs list -t snapshot
determine most recent snapshot in common
zfs snapshot pool/dataset at thenew (name format is pool/dataset at YYYYMMDDHHMMSS)
zfs send -i most_recent_snapshot_in_common new_snapshot > /var/tmp/temp_filename
scp /var/tmp/temp_filename destination:/var/tmp
ssh destination zfs receive -d pool < /var/tmp/femp_filename
ssh destination zfs destroy pool/most_recent_snapshot_in_common

The program works, it's pretty simple. 

However, I've found a problem. While it was working, I ran "zfs list -t snapshot" several times on the destination machine. I can't recall if it was during the zfs receive or the zfs destroy command, but after that something went wrong. I noticed that destroying a snapshot got an error message, despite the fact that the snapshots were  really destroyed.

Inspecting the pool with zdb -d (found it doing some Google searches) I noticed that I had developed a "hidden clone" problem. And I saw this snapshot which, aparently, came from nowhere:

rpool/tachan at newsrc-23608-1     1.33K      -   786M  -

Seems that there's some contention issue affecting the zfs command. In my case, it was triggered by a  "zfs list -t snapshot" command during either a "zfs receive -d -F" or a "zfs destroy".

I'm wondering how to capture useful data regarding this...

Borja.