[Bug 211990] iscsi fails to reconnect and does not release devices

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Fri Aug 19 22:19:35 UTC 2016


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #4 from Ben RUBSON <ben.rubson at gmail.com> ---
One strange thing I noticed.
(I put all things that could be interesting from my troubleshooting)

As soon as I put the network interface down, I get the following message on
target side, one per target :
17:01:00 srv2 kernel: WARNING: 192.168.2.1 (iqn.1994-09.org.freebsd:srv1): no
ping reply (NOP-Out) after 5 seconds; dropping connection

Then, on initiator side, I get these messages for each target :
Aug 19 17:01:07 srv1 kernel: iscsi_maintenance_thread_reconnect: 192.168.2.2
(iqn.2012-06.srv2:hm4): connection failed, destroying devices
Aug 19 17:01:07 srv1 kernel: iscsi_session_cleanup: 192.168.2.2
(iqn.2012-06.srv2:hm4): freezing
Aug 19 17:01:07 srv1 kernel: iscsi_session_cleanup: 192.168.2.2
(iqn.2012-06.srv2:hm4): deregistering SIM

At this moment, on initiator side, one iscsid process per target appears.

10 seconds later, on initiator side, I get these messages for each target :
Aug 19 17:01:18 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:hm4): login
timed out after 11 seconds; reconnecting
Aug 19 17:01:18 srv1 kernel: iscsi_maintenance_thread_reconnect: 192.168.2.2
(iqn.2012-06.srv2:hm4): connection failed, destroying devices

And at the same time, a second iscsid process per target appears, so that I get
2 iscsid processes per target :
# ps auxxw | grep iscsid:
root  866    0.0  0.0 16632  2144  -  I     4:58pm   0:00.00 iscsid:
192.168.2.2 (iqn.2012-06.srv2:hm4) (iscsid)
root  881    0.0  0.0 16632  2144  -  I     4:58pm   0:00.00 iscsid:
192.168.2.2 (iqn.2012-06.srv2:hm4) (iscsid)
(...)
However sounds like there is a limit to 30 processes, as for 17 targets I would
have expected 34 processes, but I only get 30.

If I put the NIC up before the second process is created, I only get one
reconnection message per target in target logs.
If I put the NIC up after the second process is created, I get a lot more
reconnection messages in target logs, between 40 and 50 for 17 targets.

Do we expect these additional processes ?
I think we would only expect one process / one reconnection message per target
?
Seems strange to have all these "duplicated" connection retries.

Another related question to the "30" processes found :
Is there any limit to 30 targets ?
I found a maxproc option in ctl.conf (default to 30) but I don't exactly know
what it means (I tested values of 1 to 50 without seeing any change).
No option found however on initiator side.

I noticed that we can reproduce this bug easier when we "stress" the devices :
disconnect network as soon as targets are reconnected, and reconnect it as soon
as they are disconnected.



Additionally to this, I had 8 kernel crashes, initator or target, each time
with the same address / pointer :
kernel: Fatal trap 12: page fault while in kernel mode
kernel: fault virtual address   = 0x1e8
kernel: instruction pointer     = 0x20:0xffffffff80936933

I also got a stacktrace, but did not get it's pointer address.
http://img4.hostingpics.net/pics/707217211990.png

I'm also trying to get a full dump.
However I'm not sure this kernel crash issue is related to the reconnection
issue, perhaps there are 2 issues.

# uname -v
FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11 18:38:15 UTC 2016
root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC



A lot of info !
I hope we will be able to correct these issues.

Many thanks,

Ben

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list