zfs_trim_enabled destroys zio_free() performance

Sean Chittenden seanc at groupon.com
Mon Sep 14 23:42:24 UTC 2015


Random
​industry note
, we've had issues with trim-enabled hosts where
​deleting a moderate sized dataset (~1TB) would cause the box to trip over
the
deadman timer
​.  When the host comes back up it almost immediately panics again because
the trim commands will be reissued again causing the box to panic in a
loop.  Disabling TRIM breaks this cycle.
At the very least, getting trim to obey a different timer would useful.
 -sc​


# panic: I/O to pool 'tank' appears to be hung on vdev guid
> 1181753144268412659 at '/dev/da0p1'.
> cpuid = 13
> KDB: stack backtrace:
> #0 0xffffffff805df950 at kdb_backtrace+0x60
> #1 0xffffffff805a355d at panic+0x17d
> #2 0xffffffff81034db3 at vdev_deadman+0x123
> #3 0xffffffff81034cc0 at vdev_deadman+0x30
> #4 0xffffffff81034cc0 at vdev_deadman+0x30
> #5 0xffffffff810298e5 at spa_deadman+0x85
> #6 0xffffffff805b8ca5 at softclock_call_cc+0x165
> #7 0xffffffff805b90b4 at softclock+0x94
> #8 0xffffffff805716cb at intr_event_execute_handlers+0xab
> #9 0xffffffff80571b16 at ithread_loop+0x96
> #10 0xffffffff8056f19a at fork_exit+0x9a
> #11 0xffffffff807a817e at fork_trampoline+0xe
> Uptime: 59s



On Sun, Sep 13, 2015 at 6:03 AM, Steven Hartland <killing at multiplay.co.uk>
wrote:
>
> Do you remember if this was this causing a deadlock or something similar
that's easy to provoke?
>
>     Regards
>     Steve
>
>
> On 11/09/2015 18:00, Alexander Motin wrote:
>>
>> Hi.
>>
>> The code in question was added by me at r253992. Commit message tells it
>> was made to decouple locks. I don't remember much more details, but may
>> be it can be redone somehow else.
>>
>> On 11.09.2015 19:07, Matthew Ahrens wrote:
>>>
>>> I discovered that when destroying a ZFS snapshot, we can end up using
>>> several seconds of CPU via this stack trace:
>>>
>>>                kernel`spinlock_exit+0x2d
>>>                kernel`taskqueue_enqueue+0x12c
>>>                zfs.ko`zio_issue_async+0x7c
>>>                zfs.ko`zio_execute+0x162
>>>                zfs.ko`dsl_scan_free_block_cb+0x15f
>>>                zfs.ko`bpobj_iterate_impl+0x25d
>>>                zfs.ko`bpobj_iterate_impl+0x46e
>>>                zfs.ko`dsl_scan_sync+0x152
>>>                zfs.ko`spa_sync+0x5c1
>>>                zfs.ko`txg_sync_thread+0x3a6
>>>                kernel`fork_exit+0x9a
>>>                kernel`0xffffffff80d0acbe
>>>               6558 ms
>>>
>>> This is not good for performance since, in addition to the CPU cost, it
>>> doesn't allow the sync thread to do anything else, and this is
>>> observable as periods where we don't do any write i/o to disk for
>>> several seconds.
>>>
>>> The problem is that when zfs_trim_enabled is set (which it is by
>>> default), zio_free_sync() always sets ZIO_STAGE_ISSUE_ASYNC, causing the
>>> free to be dispatched to a taskq.  Since each task completes very
>>> quickly, there is a large locking and context switching overhead -- we
>>> would be better off just processing the free in the caller's context.
>>>
>>> I'm not sure exactly why we need to go async when trim is enabled, but
>>> it seems like at least we should not bother going async if trim is not
>>> actually being used (e.g. with an all-spinning-disk pool).  It would
>>> also be worth investigating not going async even when trim is useful
>>> (e.g. on SSD-based pools).
>>>
>>> Here is the relevant code:
>>>
>>> zio_free_sync():
>>>          if (zfs_trim_enabled)
>>>                  stage |= ZIO_STAGE_ISSUE_ASYNC |
ZIO_STAGE_VDEV_IO_START |
>>>                      ZIO_STAGE_VDEV_IO_ASSESS;
>>>          /*
>>>           * GANG and DEDUP blocks can induce a read (for the gang block
>>> header,
>>>           * or the DDT), so issue them asynchronously so that this
thread is
>>>           * not tied up.
>>>           */
>>>          else if (BP_IS_GANG(bp) || BP_GET_DEDUP(bp))
>>>                  stage |= ZIO_STAGE_ISSUE_ASYNC;
>>>
>>> --matt
>>
>>
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"




--
Sean Chittenden


More information about the freebsd-fs mailing list