>We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
>but have been having trouble with short spikes in application memory
>usage resulting in huge amounts of swapping, bringing the whole machine
>to its knees and crashing it hard.  I suspect this is because when there
>is a sudden spike in memory usage the zfs arc reclaim thread is unable
>to free system memory fast enough.

There were a large number of fairly serious ZFS bugs that have been
fixed since 8.2-RELEASE and I would suggest you look at upgrading.
That said, I haven't seen the specific problem you are reporting.

>      * is this a known problem?

I'm unaware of it specifically as it relates to ZFS.  You don't mention
how big the memory usage spike is but unless there is sufficient free+
cache available to cope with a usage spike then you will have problems
whether it's UFS or ZFS (though it's possibly worse with ZFS).
FreeBSD is known not to cope well with running out of memory.

>      * what is the community's advice for production machines running
>        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
>        that there's enough actually free memory to handle a spike in
>        application memory usage) the best solution to this
>        spike-in-memory-means-crash problem?

Are you swapping onto a ZFS vdev?  If so, change back to a raw (or
geom) device - swapping to ZFS is known to be problematic.  If you
have very spiky memory requirements, increasing vm.v_cache_min and/or
vm.v_free_reserved might give you better results.

>      * has FreeBSD 9.0 / ZFS v28 solved this problem?

The ZFS code is the same in 9.0 and 8.3.  Since 8.3 is less of a jump,
I'd recommend that you try 8.3-prerelease in a test box and see how
it handles your load.  Note that there's no need to upgrade your pools
from v15 to v28 unless you want the ZFS features - the actual ZFS
code is independent of pool version.

>      * rather than setting a hard limit on the ARC cache size, is it
>        possible to adjust the auto-tuning variables to leave more free
>        memory for spiky memory situations?  e.g. set the auto-tuning to
>        make arc eat 80% of memory instead of ~95% like it is at
>        present?

Memory spikes are absorbed by vm.v_cache_min and vm.v_free_reserved in
the first instance.  The current vfs.zfs.arc_max default may be a bit
high for some workloads but at this point in time, you will need to
tune it manually.

