zfs_trim_enabled destroys zio_free() performance

Steven Hartland killing at multiplay.co.uk
Sun Sep 13 13:04:02 UTC 2015


Do you remember if this was this causing a deadlock or something similar 
that's easy to provoke?

     Regards
     Steve

On 11/09/2015 18:00, Alexander Motin wrote:
> Hi.
>
> The code in question was added by me at r253992. Commit message tells it
> was made to decouple locks. I don't remember much more details, but may
> be it can be redone somehow else.
>
> On 11.09.2015 19:07, Matthew Ahrens wrote:
>> I discovered that when destroying a ZFS snapshot, we can end up using
>> several seconds of CPU via this stack trace:
>>
>>                kernel`spinlock_exit+0x2d
>>                kernel`taskqueue_enqueue+0x12c
>>                zfs.ko`zio_issue_async+0x7c
>>                zfs.ko`zio_execute+0x162
>>                zfs.ko`dsl_scan_free_block_cb+0x15f
>>                zfs.ko`bpobj_iterate_impl+0x25d
>>                zfs.ko`bpobj_iterate_impl+0x46e
>>                zfs.ko`dsl_scan_sync+0x152
>>                zfs.ko`spa_sync+0x5c1
>>                zfs.ko`txg_sync_thread+0x3a6
>>                kernel`fork_exit+0x9a
>>                kernel`0xffffffff80d0acbe
>>               6558 ms
>>
>> This is not good for performance since, in addition to the CPU cost, it
>> doesn't allow the sync thread to do anything else, and this is
>> observable as periods where we don't do any write i/o to disk for
>> several seconds.
>>
>> The problem is that when zfs_trim_enabled is set (which it is by
>> default), zio_free_sync() always sets ZIO_STAGE_ISSUE_ASYNC, causing the
>> free to be dispatched to a taskq.  Since each task completes very
>> quickly, there is a large locking and context switching overhead -- we
>> would be better off just processing the free in the caller's context.
>>
>> I'm not sure exactly why we need to go async when trim is enabled, but
>> it seems like at least we should not bother going async if trim is not
>> actually being used (e.g. with an all-spinning-disk pool).  It would
>> also be worth investigating not going async even when trim is useful
>> (e.g. on SSD-based pools).
>>
>> Here is the relevant code:
>>
>> zio_free_sync():
>>          if (zfs_trim_enabled)
>>                  stage |= ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_VDEV_IO_START |
>>                      ZIO_STAGE_VDEV_IO_ASSESS;
>>          /*
>>           * GANG and DEDUP blocks can induce a read (for the gang block
>> header,
>>           * or the DDT), so issue them asynchronously so that this thread is
>>           * not tied up.
>>           */
>>          else if (BP_IS_GANG(bp) || BP_GET_DEDUP(bp))
>>                  stage |= ZIO_STAGE_ISSUE_ASYNC;
>>
>> --matt
>



More information about the freebsd-fs mailing list