slowdown of zfs (tx->tx)

Tue Jan 15 01:37:31 UTC 2013

On Mon, Jan 14, 2013 at 1:46 PM, Nicolas Rachinsky
<fbsd-mas-0 at ml.turing-complete.org> wrote:
> * Artem Belevich <art at freebsd.org> [2013-01-14 12:41 -0800]:
>> txg_wait_open means that ZFS is waiting for ongoing transaction group
>> sync. There should've been some write activity in this case.
>>
>> Check what zfs kernel threads are doing with procstat -kk on zfskern process.
>
> # sh -c 'for i in `jot 1000`; do procstat -kk 47 ; sleep 0.1; done' | sort | uniq -c
> 1000    47 100083 zfskern          arc_reclaim_thre mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x29d fork_exit+0x11f fork_trampoline+0xe
> 1000    47 100084 zfskern          l2arc_feed_threa mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1a8 fork_exit+0x11f fork_trampoline+0xe
> 1000    47 100224 zfskern          txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x11f fork_trampoline+0xe
>  165    47 100225 zfskern          txg_thread_enter <running>
>    1    47 100225 zfskern          txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x5e5 dbuf_findbp+0x107 dbuf_prefetch+0x8f dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 dbuf_read+0x675 dnode_hold_impl+0xf2 dmu_buf_hold_array+0x38 dmu_write+0x53 space_map_sync+0x1ff metaslab_sync+0x13e vdev_sync+0x6e spa_sync+0x3ab txg_sync_thread+0x139
>    1    47 100225 zfskern          txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x5e5 dbuf_will_dirty+0x60 dmu_write+0x82 space_map_sync+0x1ff metaslab_sync+0x13e vdev_sync+0x6e spa_sync+0x3ab txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe
>    1    47 100225 zfskern          txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dsl_pool_sync+0x189 spa_sync+0x336 txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe
>   81    47 100225 zfskern          txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dsl_pool_sync+0x2c3 spa_sync+0x336 txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe
>  719    47 100225 zfskern          txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dsl_pool_sync+0xe0 spa_sync+0x336 txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe
>    4    47 100225 zfskern          txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 spa_sync+0x286 txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe
>    2    47 100225 zfskern          txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 spa_sync+0x370 txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe
>   21    47 100225 zfskern          txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 vdev_config_sync+0xe3 spa_sync+0x49a txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe
>    5    47 100225 zfskern          txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 vdev_uberblock_sync_list+0xd0 vdev_config_sync+0x10f spa_sync+0x49a txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe
> 1000   PID    TID COMM             TDNAME           KSTACK

OK. threads responsible for transaction sync seem to be stuck in zio_wait.
zio_wait is in turn waiting for some task thread to be done with its work.
Now you need to figure out what are those task threads doing.

'procstat -kk 0' will dump few hundreds of taskq threads. Most of them
would be zfs related. On an idle box (8.3/amd64 in my case) most of
them would have the same stack trace looking like this (modulo
offsets):

mi_switch+0x196 sleepq_wait+0x42 _sleep+0x3c0
taskqueue_thread_loop+0xbe fork_exit+0x11f fork_trampoline+0xe

Look for stack traces that don't match that pattern.

--Artem