slow "zfs destroy snapshot" with predictable time pattern

Sun Feb 27 18:53:35 UTC 2011

I thought that 'zfs snapshot destroy' should be fast (on the order of
a few seconds), but mine are taking a predictably long time on a
pretty modest filesystem (details below).

I discovered this when a typo caused many more snapshots than I
intended (every minute!); I had about 12,000 of them before I noticed.
 Destroying the first snapshot took about 39 wallclock seconds on an
otherwise idle system.  A few more destroys took almost exactly the
same amount of time.

I know little about ZFS under the hood, but I wanted to investigate a
little bit. I scripted a loop of 'time zfs destroy snapshot' and let
it run overnight.  Each destroy was consistently taking 37-40 seconds,
but then after hundreds of deletions in that time range, I saw a
jagged spike, followed by a consistent drop that has stayed in the
23-25s range:

[hours of 38-39s destroys snipped]

real    0m38.205s
real    0m38.455s
real    0m38.580s
real    0m37.414s
real    0m35.330s <-- small drop here
real    0m35.347s
real    0m35.380s
real    0m35.355s
real    0m35.255s
real    0m35.514s
real    0m35.422s
real    0m35.464s
real    0m46.121s   <-- small spike here
real    0m44.630s
real    0m46.021s
real    1m19.443s   <-- big spike here
real    0m40.896s
real    0m22.848s  <-- drop into the 20s range
real    0m29.039s
real    0m29.831s
real    0m26.348s
real    0m22.623s
real    0m29.314s
real    0m29.589s
real    0m26.573s
real    0m22.773s

[hours of of 23-25s destroys snipped]

I know very little about ZFS under the hood, but this model might fit the facts:

* Normally, 'zfs destroy snapshot' is fast (on the order of a few seconds);

* 'zfs destroy snapshot' has to briefly analyze all snapshots prior to
destruction;

* A particular 'problem' snapshot can slow that full analysis by a
consistent amount of time;

* Destroying that 'problem' snapshot drops the analysis time by that amount.

If my model is correct, I'm going to see one or more spikes, followed
by corresponding drops, until the destroys return to a reasonable
rate.

This guy had a problem that might also fit that model -- that
particular snapshots can be very slow, and removing them removes the
time delay. That thread notes that it was due to a low-memory
condition, and OpenSolaris bug 6542681 was filed for it.  I do not
think that my problem is because of low memory.

    http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg07647.html

I have stopped the destroys in case the remaining 'problem' snapshot is useful.

The system is 8.1-SECURITY, amd64, 4GB RAM, no sysctl or loader
tweaks, ZFS v3, zpool v14, single 58GB ZFS pool.

# zfs list
NAME              USED  AVAIL  REFER  MOUNTPOINT
atoz-backup      15.4G  58.0G  25.5K  /atoz-backup
atoz-backup/usr  15.3G  58.0G  14.8G  /atoz-backup/usr

# df -ki | egrep 'atoz|Filesystem'
Filesystem                               1024-blocks     Used
Avail Capacity iused     ifree %iused  Mounted on
atoz-backup                                 60789979       25
60789953     0%       6 121579907    0%   /atoz-backup
atoz-backup/usr                             76281655 15491701
60789953    20%  714124 121579907    1%   /atoz-backup/usr

Royce