ZFS stalled after some mirror disks were lost

Fri Oct 6 10:09:00 UTC 2017

> On 02 Oct 2017, at 20:12, Ben RUBSON <ben.rubson at gmail.com> wrote:
> 
> Hi,
> 
> On a FreeBSD 11 server, the following online/healthy zpool :
> 
> home
> mirror-0
>   label/local1
>   label/local2
>   label/iscsi1
>   label/iscsi2
> mirror-1
>   label/local3
>   label/local4
>   label/iscsi3
>   label/iscsi4
> cache
> label/local5
> label/local6
> 
> A sustained read throughput of 180 MB/s, 45 MB/s on each iscsi disk
> according to "zpool iostat", nothing on local disks.
> No write IOs.
> 
> Let's disconnect all iSCSI disks :
> iscsictl -Ra
> 
> Expected behavior :
> IO activity flawlessly continue on local disks.
> 
> What happened :
> All IOs stalled, server only answers to IOs made to its zroot pool.
> All commands related to the iSCSI disks (iscsictl), or to ZFS (zfs/zpool),
> don't return.
> 
> Questions :
> Why this behavior ?
> How to know what happens ? (/var/log/messages says almost nothing)
> 
> I already disconnected the iSCSI disks without any issue in the past,
> several times, but there were almost no IOs running.
> 
> Thank you for your help !
> 
> Ben

Hello,

So first, many thanks again to Andriy, we spent almost 3 hours debugging the
stalled server to find the root cause of the issue.

Sounds like I would need help from iSCSI dev team (Edward perhaps ?), as issue
seems to be on this side.

Here is Andriy conclusion after the debug session, I quote him :

> So, it seems that the root cause of all evil is this outstanding zio (it might
> be not the only one).
> In other words, it looks like iscsi stack bailed out without completing all
> outstanding i/o requests that it had.
> It should either return success or error for every request, it can not simply
> drop a request.
> And that appears to be what happened here.

> It looks like ZFS is fragile in the face of this type of errors.
> Essentially, each logical i/o request obtains a configuration lock of type 'zio'
> in shared mode to prevent certain configuration changes from happening while
> there are any outsanding zio-s.
> If a zio is lost, then this lock is leaked.
> Then, the code that deals with vdev failures tries to take this lock in
> exclusive mode while holding a few other configuration locks also in exclsuive
> mode so, any other thread needing those locks would block.
> And there are code paths where a configuration lock is taken while
> spa_namespace_lock is held.
> And when spa_namespace_lock is never dropped then the system is close to toast,
> because all pool lookups would get stuck.
> I don't see how this can be fixed in ZFS.

> It seems that when the initiator is being removed it doesn't properly terminate
> in-glight requests.
> It would be interesting to see what happens if you test other scenarios.

So I tested the following other scenarios :
1 - drop all iSCSI traffic using ipfw on the target
2 - ifdown the iSCSI NIC on the target
3 - ifdown the iSCSI NIC on the initiator
4 - stop ctld (on the target of course)

I tested all of them several times, 5 or 6 times each ?

I managed to kernel panic (!) 2 times.
First time in case 2.
Second time in case 4.
Not sure I would not have been able to panic in other test cases though.

Stack traces :
https://s1.postimg.org/2hfdpsvban/panic_case2.png
https://s1.postimg.org/2ac5ud9t0f/panic_case4.png

(kgdb) list *g_io_request+0x4a7
0xffffffff80a14dc7 is in g_io_request (/usr/src/sys/geom/geom_io.c:638).
633			g_bioq_unlock(&g_bio_run_down);
634			/* Pass it on down. */
635			if (first)
636				wakeup(&g_wait_down);
637		}
638	}
639	
640	void
641	g_io_deliver(struct bio *bp, int error)
642	{

I had some kernel panics on the same servers a few months ago,
loosing iSCSI targets which were used in a gmirror with local disks.
gmirror should have continued to work flawlessly (as ZFS)
using local disks but the server crashed.

Stack traces :
https://s1.postimg.org/14v4sabhv3/panic_g_destroy1.png
https://s1.postimg.org/437evsk6rz/panic_g_destroy2.png
https://s1.postimg.org/8pt1whiy5b/panic_g_destroy3.png

(kgdb) list *g_destroy_consumer+0x53
0xffffffff80a18563 is in g_destroy_consumer (geom.h:369).
364			KASSERT(g_valid_obj(ptr) == 0,
365			    ("g_free(%p) of live object, type %d", ptr,
366			    g_valid_obj(ptr)));
367		}
368	#endif
369		free(ptr, M_GEOM);
370	}
371	
372	#define g_topology_lock() 					\
373		do {							\

> I think that all problems that you have seen are different sides of the same
> underlying issue.  It looks like iscsi does not properly depart from geom and
> leaves behind some dangling pointers...
> 
> The panics you got today most likely occurred here:
> 	bp->bio_to->geom->start(bp);
> 
> And the most likely reason is that bio_to points to a destroyed geom provider.
> 
> I wonder if you'd be able to get into direct contact with a developer
> responsible for iscsi in FreeBSD.  I think that it is a relatively recent
> addition and it was under a FreeBSD Foundation project.  So, I'd expect that the
> developer should be responsive.

Feel free then to contact me if you need, so that we can go further on this !

Thank you very much for your help,

Ben