nfs + zfs hangs on RELENG_9
Nikolay Denev
ndenev at gmail.com
Wed Oct 3 08:43:30 UTC 2012
On Oct 3, 2012, at 11:12 AM, Andriy Gapon <avg at freebsd.org> wrote:
> on 02/10/2012 13:26 Nikolay Denev said the following:
>> 7 100537 zfskern txg_thread_enter mi_switch+0x186 sleepq_wait+0x42
>> _cv_wait+0x121 zio_wait+0x61 dsl_pool_sync+0xe0 spa_sync+0x336
>> txg_sync_thread+0x136 fork_exit+0x11f fork_trampoline+0xe
>
> From my past experience the threads stuck in zio_wait always meant an I/O
> operation stuck in a storage controller driver, controller firmware, etc.
> Not necessarily a case here, but a possibility.
>
> Perhaps try camcontrol tags <disk devname> -v to see the state of disk queues.
>
I'm using the mfi(4) driver which does not seem to be under CAM, but I'm also running it with
the following loader tunable : hw.mfi.max_cmds=254, which is an increase over the standard 128 tags,
and maybe this could be the problem.
I'll revert it now and retest.
> P.S.
> It would be nice if for debugging purposes we had some place in zio to record
> bio that it depends upon.
> E.g. something like:
> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> index 80d9336..75b2fcf 100644
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> @@ -432,6 +432,7 @@ struct zio {
> #ifdef _KERNEL
> /* FreeBSD only. */
> struct ostask io_task;
> + void *io_bio;
> #endif
> };
>
> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> index 7d146ff..36bb5ad 100644
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> @@ -684,6 +684,7 @@ vdev_geom_io_intr(struct bio *bp)
> vd->vdev_delayed_close = B_TRUE;
> }
> }
> + zio->io_bio = NULL;
> g_destroy_bio(bp);
> zio_interrupt(zio);
> }
> @@ -732,6 +733,7 @@ sendreq:
> }
> bp = g_alloc_bio();
> bp->bio_caller1 = zio;
> + zio->io_bio = bp;
> switch (zio->io_type) {
> case ZIO_TYPE_READ:
> case ZIO_TYPE_WRITE:
>
> Then, in situation like yours you could use kgdb, switch to the thread in
> zio_wait, go to zio_wait frame and get bio pointer from zio. From there you
> could try to deduce what is going on with the I/O request.
>
I'm rebuilding now with these patches and DDB/KDB enabled and will try to get this information if it happens again.
> --
> Andriy Gapon
Thanks!
Regards,
Nikolay
More information about the freebsd-fs
mailing list