zfs_trim_enabled destroys zio_free() performance
Matthew Ahrens
mahrens at delphix.com
Fri Sep 11 16:07:49 UTC 2015
I discovered that when destroying a ZFS snapshot, we can end up using
several seconds of CPU via this stack trace:
kernel`spinlock_exit+0x2d
kernel`taskqueue_enqueue+0x12c
zfs.ko`zio_issue_async+0x7c
zfs.ko`zio_execute+0x162
zfs.ko`dsl_scan_free_block_cb+0x15f
zfs.ko`bpobj_iterate_impl+0x25d
zfs.ko`bpobj_iterate_impl+0x46e
zfs.ko`dsl_scan_sync+0x152
zfs.ko`spa_sync+0x5c1
zfs.ko`txg_sync_thread+0x3a6
kernel`fork_exit+0x9a
kernel`0xffffffff80d0acbe
6558 ms
This is not good for performance since, in addition to the CPU cost, it
doesn't allow the sync thread to do anything else, and this is observable
as periods where we don't do any write i/o to disk for several seconds.
The problem is that when zfs_trim_enabled is set (which it is by default),
zio_free_sync() always sets ZIO_STAGE_ISSUE_ASYNC, causing the free to be
dispatched to a taskq. Since each task completes very quickly, there is a
large locking and context switching overhead -- we would be better off just
processing the free in the caller's context.
I'm not sure exactly why we need to go async when trim is enabled, but it
seems like at least we should not bother going async if trim is not
actually being used (e.g. with an all-spinning-disk pool). It would also
be worth investigating not going async even when trim is useful (e.g. on
SSD-based pools).
Here is the relevant code:
zio_free_sync():
if (zfs_trim_enabled)
stage |= ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_VDEV_IO_START |
ZIO_STAGE_VDEV_IO_ASSESS;
/*
* GANG and DEDUP blocks can induce a read (for the gang block
header,
* or the DDT), so issue them asynchronously so that this thread is
* not tied up.
*/
else if (BP_IS_GANG(bp) || BP_GET_DEDUP(bp))
stage |= ZIO_STAGE_ISSUE_ASYNC;
--matt
More information about the freebsd-fs
mailing list