Destroying ZFS snapshots "too quickly": xpt_scan_lun: can't allocate CCB, can't continue

Sat Feb 9 14:46:03 UTC 2013

Before the introduction of async_destroy I wrote a script to destroy
ZFS snapshots in parallel to speed up the process. It's available at:
http://www.fabiankeil.de/sourcecode/zfs-snapshot-destroyer/zsd.pl

A couple of years ago the only downside seemed to be that it
requires more memory and file descriptors (due to multiple zfs
processes running at the same time) and that errors are ignored
(implementation detail of the script).

Recently I noticed that destroying several hundred (500)
snapshots this way risks rendering the system unresponsive.
I rarely do that, so it might not actually be a regression.

When using X the screen freezes and keyboard input is ignored
so it's hard to tell what's going on.

When running the script on the console alt+Fx are often still
accepted to switch consoles, but other keyboard input like
entering commands or trying to login has no visible effect.

A running top is killed and the system frequently logs:
"xpt_scan_lun: can't allocate CCB, can't continue".

Plugging in USB devices still result in the expected messages,
but other than this the system seems to be unresponsive and
doesn't recover (I only waited a couple of minutes, though).

A "CCB" seems to be rather small:
http://fxr.watson.org/fxr/source/cam/cam_xpt.c#L4386
therefore I suspect that ZFS got greedy and didn't play nice
with the rest of the system. I have no proof that ZFS isn't
merely triggering a problem in another subsystem, though.

So far I haven't been able to reproduce the problem with snapshots
intentionally created for testing, but I also used a somewhat
simplistic approach to populate the snapshots.

Is this considered a bug or is quickly destroying snapshots just
something for the "don't do this" or "not without proper tuning"
departments?

I would also be interested to know if there's a way to somehow
roughly figure out from userland how many snapshots can be safely
destroyed in a row. Example: Look at "some" system state, destroy
a safe amount of snapshots, look at "some" system state again and
interpolate.

Before top gets killed it usually shows that zfskern takes
more than 50% WCPU, but this can also happen when the system
doesn't become unresponsive and thus probably isn't a good
metric (the delay also doesn't help of course).

Fabian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-current/attachments/20130209/ab95a2b7/attachment.sig>