slow "zfs destroy snapshot" with predictable time pattern
Royce Williams
royce.williams at gmail.com
Sun Feb 27 18:53:35 UTC 2011
I thought that 'zfs snapshot destroy' should be fast (on the order of
a few seconds), but mine are taking a predictably long time on a
pretty modest filesystem (details below).
I discovered this when a typo caused many more snapshots than I
intended (every minute!); I had about 12,000 of them before I noticed.
Destroying the first snapshot took about 39 wallclock seconds on an
otherwise idle system. A few more destroys took almost exactly the
same amount of time.
I know little about ZFS under the hood, but I wanted to investigate a
little bit. I scripted a loop of 'time zfs destroy snapshot' and let
it run overnight. Each destroy was consistently taking 37-40 seconds,
but then after hundreds of deletions in that time range, I saw a
jagged spike, followed by a consistent drop that has stayed in the
23-25s range:
[hours of 38-39s destroys snipped]
real 0m38.205s
real 0m38.455s
real 0m38.580s
real 0m37.414s
real 0m35.330s <-- small drop here
real 0m35.347s
real 0m35.380s
real 0m35.355s
real 0m35.255s
real 0m35.514s
real 0m35.422s
real 0m35.464s
real 0m46.121s <-- small spike here
real 0m44.630s
real 0m46.021s
real 1m19.443s <-- big spike here
real 0m40.896s
real 0m22.848s <-- drop into the 20s range
real 0m29.039s
real 0m29.831s
real 0m26.348s
real 0m22.623s
real 0m29.314s
real 0m29.589s
real 0m26.573s
real 0m22.773s
[hours of of 23-25s destroys snipped]
I know very little about ZFS under the hood, but this model might fit the facts:
* Normally, 'zfs destroy snapshot' is fast (on the order of a few seconds);
* 'zfs destroy snapshot' has to briefly analyze all snapshots prior to
destruction;
* A particular 'problem' snapshot can slow that full analysis by a
consistent amount of time;
* Destroying that 'problem' snapshot drops the analysis time by that amount.
If my model is correct, I'm going to see one or more spikes, followed
by corresponding drops, until the destroys return to a reasonable
rate.
This guy had a problem that might also fit that model -- that
particular snapshots can be very slow, and removing them removes the
time delay. That thread notes that it was due to a low-memory
condition, and OpenSolaris bug 6542681 was filed for it. I do not
think that my problem is because of low memory.
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg07647.html
I have stopped the destroys in case the remaining 'problem' snapshot is useful.
The system is 8.1-SECURITY, amd64, 4GB RAM, no sysctl or loader
tweaks, ZFS v3, zpool v14, single 58GB ZFS pool.
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
atoz-backup 15.4G 58.0G 25.5K /atoz-backup
atoz-backup/usr 15.3G 58.0G 14.8G /atoz-backup/usr
# df -ki | egrep 'atoz|Filesystem'
Filesystem 1024-blocks Used
Avail Capacity iused ifree %iused Mounted on
atoz-backup 60789979 25
60789953 0% 6 121579907 0% /atoz-backup
atoz-backup/usr 76281655 15491701
60789953 20% 714124 121579907 1% /atoz-backup/usr
Royce
More information about the freebsd-fs
mailing list