Spa deadman
Dave Baukus
daveb at spectralogic.com
Thu Oct 25 19:42:02 UTC 2018
On 10/17/2018 01:59 PM, Dave Baukus wrote:
> While running something close to FreeBSD_stable11.0, I have several crashes
> where the spa_deadman() timer is firing and panicking the system.
>
> There are not any indications of IO errors. I have dug into one of the
> crash dumps and have found the following:
I have a possible explanation that but not an understand how or why it happens.
I have multiple crash dumps showing the following:
1.) gethrtime() - zio->io_timestamp exceeds spa_deadman_synctime().
2.) gethrtime() - vd->vq.vq_io_complete_ts in the range of [50 to 180]microseconds.
The timestamp for #1 is set when the zio is initially queued to
vq_class[ZIO_PRIORITY_ASYNC_WRITE] and vq_write_offset_tree
The timestamp for #2 is set when a zio is completed via vdev_queue_io_done().
#1 implies that it took at least spa_deadman_synctime()seconds for the zio to move from
the initial IO scheduling trees to the top of vq_active_tree.
#2 Implies that vdev has been completing ZIOs and is not really "hung".
I can not explain why a ZIO would take 10 or 15 minutes (whatever spa_deadman_synctime() is by default) to
get queued to vq_active_tree where a too long running spa_sync() finds it and panics.
For the "hung vdev" the IO trees look like:
(note the 15k node in the vq_write_offset_tree; cursory examination of the ZIOs in this tree
showed many 16minutes old or older.)
vdev_queue = {
vq_vdev = 0xfffff80c867d0000,
vq_class = 0xfffff80c867d03f8,
vq_active_tree = {
avl_root = 0xfffff80b496d7250,
avl_compar = 0xffffffff81153810 <vdev_queue_offset_compare>,
avl_offset = 592,
avl_numnodes = 1,
avl_size = 968
},
vq_read_offset_tree = {
avl_root = 0x0,
avl_compar = 0xffffffff81153810 <vdev_queue_offset_compare>,
avl_offset = 616,
avl_numnodes = 0,
avl_size = 968
},
vq_write_offset_tree = {
avl_root = 0xfffff801be96f268,
avl_compar = 0xffffffff81153810 <vdev_queue_offset_compare>,
avl_offset = 616,
avl_numnodes = 15173,
avl_size = 968
},
vq_last_offset = 7712562135040,
vq_io_complete_ts = 334320934586123,
vq_lock = {
lock_object = {
lo_name = 0xffffffff81207e91 "vq->vq_lock",
lo_flags = 40960000,
lo_data = 0,
lo_witness = 0x0
},
sx_lock = 18446735278059358720
},
vq_lastoffset = 0
},
All vq_class[] trees are empty except:
vq->vq_class[3] == ZIO_PRIORITY_ASYNC_WRITE (vdev_queue_class_t *)0xfffff80c867d0488
$214 = {
vqc_active = 1,
vqc_queued_tree = {
avl_root = 0xfffff801be96f250,
avl_compar = 0xffffffff81153810 <vdev_queue_offset_compare>,
avl_offset = 592,
avl_numnodes = 15173,
avl_size = 968
}
}
More information about the freebsd-fs
mailing list