[Bug 211990] iscsi fails to reconnect and does not release devices
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Fri Aug 19 22:19:35 UTC 2016
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990
--- Comment #4 from Ben RUBSON <ben.rubson at gmail.com> ---
One strange thing I noticed.
(I put all things that could be interesting from my troubleshooting)
As soon as I put the network interface down, I get the following message on
target side, one per target :
17:01:00 srv2 kernel: WARNING: 192.168.2.1 (iqn.1994-09.org.freebsd:srv1): no
ping reply (NOP-Out) after 5 seconds; dropping connection
Then, on initiator side, I get these messages for each target :
Aug 19 17:01:07 srv1 kernel: iscsi_maintenance_thread_reconnect: 192.168.2.2
(iqn.2012-06.srv2:hm4): connection failed, destroying devices
Aug 19 17:01:07 srv1 kernel: iscsi_session_cleanup: 192.168.2.2
(iqn.2012-06.srv2:hm4): freezing
Aug 19 17:01:07 srv1 kernel: iscsi_session_cleanup: 192.168.2.2
(iqn.2012-06.srv2:hm4): deregistering SIM
At this moment, on initiator side, one iscsid process per target appears.
10 seconds later, on initiator side, I get these messages for each target :
Aug 19 17:01:18 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:hm4): login
timed out after 11 seconds; reconnecting
Aug 19 17:01:18 srv1 kernel: iscsi_maintenance_thread_reconnect: 192.168.2.2
(iqn.2012-06.srv2:hm4): connection failed, destroying devices
And at the same time, a second iscsid process per target appears, so that I get
2 iscsid processes per target :
# ps auxxw | grep iscsid:
root 866 0.0 0.0 16632 2144 - I 4:58pm 0:00.00 iscsid:
192.168.2.2 (iqn.2012-06.srv2:hm4) (iscsid)
root 881 0.0 0.0 16632 2144 - I 4:58pm 0:00.00 iscsid:
192.168.2.2 (iqn.2012-06.srv2:hm4) (iscsid)
(...)
However sounds like there is a limit to 30 processes, as for 17 targets I would
have expected 34 processes, but I only get 30.
If I put the NIC up before the second process is created, I only get one
reconnection message per target in target logs.
If I put the NIC up after the second process is created, I get a lot more
reconnection messages in target logs, between 40 and 50 for 17 targets.
Do we expect these additional processes ?
I think we would only expect one process / one reconnection message per target
?
Seems strange to have all these "duplicated" connection retries.
Another related question to the "30" processes found :
Is there any limit to 30 targets ?
I found a maxproc option in ctl.conf (default to 30) but I don't exactly know
what it means (I tested values of 1 to 50 without seeing any change).
No option found however on initiator side.
I noticed that we can reproduce this bug easier when we "stress" the devices :
disconnect network as soon as targets are reconnected, and reconnect it as soon
as they are disconnected.
Additionally to this, I had 8 kernel crashes, initator or target, each time
with the same address / pointer :
kernel: Fatal trap 12: page fault while in kernel mode
kernel: fault virtual address = 0x1e8
kernel: instruction pointer = 0x20:0xffffffff80936933
I also got a stacktrace, but did not get it's pointer address.
http://img4.hostingpics.net/pics/707217211990.png
I'm also trying to get a full dump.
However I'm not sure this kernel crash issue is related to the reconnection
issue, perhaps there are 2 issues.
# uname -v
FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11 18:38:15 UTC 2016
root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
A lot of info !
I hope we will be able to correct these issues.
Many thanks,
Ben
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list