zfs_trim_enabled destroys zio_free() performance

Matthew Ahrens mahrens at delphix.com
Fri Sep 11 16:07:49 UTC 2015


I discovered that when destroying a ZFS snapshot, we can end up using
several seconds of CPU via this stack trace:

              kernel`spinlock_exit+0x2d
              kernel`taskqueue_enqueue+0x12c
              zfs.ko`zio_issue_async+0x7c
              zfs.ko`zio_execute+0x162
              zfs.ko`dsl_scan_free_block_cb+0x15f
              zfs.ko`bpobj_iterate_impl+0x25d
              zfs.ko`bpobj_iterate_impl+0x46e
              zfs.ko`dsl_scan_sync+0x152
              zfs.ko`spa_sync+0x5c1
              zfs.ko`txg_sync_thread+0x3a6
              kernel`fork_exit+0x9a
              kernel`0xffffffff80d0acbe
             6558 ms

This is not good for performance since, in addition to the CPU cost, it
doesn't allow the sync thread to do anything else, and this is observable
as periods where we don't do any write i/o to disk for several seconds.

The problem is that when zfs_trim_enabled is set (which it is by default),
zio_free_sync() always sets ZIO_STAGE_ISSUE_ASYNC, causing the free to be
dispatched to a taskq.  Since each task completes very quickly, there is a
large locking and context switching overhead -- we would be better off just
processing the free in the caller's context.

I'm not sure exactly why we need to go async when trim is enabled, but it
seems like at least we should not bother going async if trim is not
actually being used (e.g. with an all-spinning-disk pool).  It would also
be worth investigating not going async even when trim is useful (e.g. on
SSD-based pools).

Here is the relevant code:

zio_free_sync():
        if (zfs_trim_enabled)
                stage |= ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_VDEV_IO_START |
                    ZIO_STAGE_VDEV_IO_ASSESS;
        /*
         * GANG and DEDUP blocks can induce a read (for the gang block
header,
         * or the DDT), so issue them asynchronously so that this thread is
         * not tied up.
         */
        else if (BP_IS_GANG(bp) || BP_GET_DEDUP(bp))
                stage |= ZIO_STAGE_ISSUE_ASYNC;

--matt


More information about the freebsd-fs mailing list