nfs + zfs hangs on RELENG_9

Nikolay Denev ndenev at gmail.com
Wed Oct 3 08:43:30 UTC 2012


On Oct 3, 2012, at 11:12 AM, Andriy Gapon <avg at freebsd.org> wrote:

> on 02/10/2012 13:26 Nikolay Denev said the following:
>> 7 100537 zfskern          txg_thread_enter mi_switch+0x186 sleepq_wait+0x42
>> _cv_wait+0x121 zio_wait+0x61 dsl_pool_sync+0xe0 spa_sync+0x336
>> txg_sync_thread+0x136 fork_exit+0x11f fork_trampoline+0xe
> 
> From my past experience the threads stuck in zio_wait always meant an I/O
> operation stuck in a storage controller driver, controller firmware, etc.
> Not necessarily a case here, but a possibility.
> 
> Perhaps try camcontrol tags <disk devname> -v to see the state of disk queues.
> 

I'm using the mfi(4) driver which does not seem to be under CAM, but I'm also running it with 
the following loader tunable : hw.mfi.max_cmds=254, which is an increase over the standard 128 tags,
and maybe this could be the problem.
I'll revert it now and retest.

> P.S.
> It would be nice if for debugging purposes we had some place in zio to record
> bio that it depends upon.
> E.g. something like:
> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> index 80d9336..75b2fcf 100644
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> @@ -432,6 +432,7 @@ struct zio {
> #ifdef _KERNEL
> 	/* FreeBSD only. */
> 	struct ostask	io_task;
> +	void		*io_bio;
> #endif
> };
> 
> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> index 7d146ff..36bb5ad 100644
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> @@ -684,6 +684,7 @@ vdev_geom_io_intr(struct bio *bp)
> 			vd->vdev_delayed_close = B_TRUE;
> 		}
> 	}
> +	zio->io_bio = NULL;
> 	g_destroy_bio(bp);
> 	zio_interrupt(zio);
> }
> @@ -732,6 +733,7 @@ sendreq:
> 	}
> 	bp = g_alloc_bio();
> 	bp->bio_caller1 = zio;
> +	zio->io_bio = bp;
> 	switch (zio->io_type) {
> 	case ZIO_TYPE_READ:
> 	case ZIO_TYPE_WRITE:
> 
> Then, in situation like yours you could use kgdb, switch to the thread in
> zio_wait, go to zio_wait frame and get bio pointer from zio.  From there you
> could try to deduce what is going on with the I/O request.
> 

I'm rebuilding now with these patches and DDB/KDB enabled and will try to get this information if it happens again.

> -- 
> Andriy Gapon

Thanks!

Regards,
Nikolay


More information about the freebsd-fs mailing list