iSCSI issues after upgrading to 11.2 x64 RELEASE

Mon Sep 3 21:06:54 UTC 2018

Hi,

I'm not sure what's going on but after running an upgrade from 11.1 to 
11.2 I'm getting many iscsi errors. It actually gets so severe that it 
freezes the system.

Dmesg is flooded with lines like this:

(da25:iscsi2:0:0:1): WRITE(10). CDB: 2a 00 28 76 02 50 00 00 08 00
(da25:iscsi2:0:0:1): CAM status: SCSI Status Error
(da25:iscsi2:0:0:1): SCSI status: Check Condition
(da25:iscsi2:0:0:1): SCSI sense: UNIT ATTENTION asc:29,7 (I_T nexus loss 
occurred)
(da25:iscsi2:0:0:1): Retrying command (per sense data)
(da27:iscsi1:0:0:0): READ(16). CDB: 88 00 00 00 00 04 41 df 82 78 00 00 
01 00 00 00
(da27:iscsi1:0:0:0): CAM status: SCSI Status Error
(da27:iscsi1:0:0:0): SCSI status: Check Condition
(da27:iscsi1:0:0:0): SCSI sense: UNIT ATTENTION asc:29,7 (I_T nexus loss 
occurred)
(da27:iscsi1:0:0:0): Retrying command (per sense data)

The upgrade actually turned into a fresh install since my zroot got 
corrupted after one of my mirrored disks died and I replaced the 
remaining original with a new one. The system originally was on 11.1 so 
the 'clean' reinstall went to 11.2.

There is also quite high cpu interrupt I'm noticing too at a steady 25% 
though can go up to 75% with system load averages getting to above 40 
from 'top' output.

A little bit of digging around I came across this: 
https://lists.freebsd.org/pipermail/freebsd-net/2017-June/048293.html

however, as mentioned below the 'loader' variables cause a complete 
system lockup.

In 11.1 I had these variables set in /boot/loader.conf:

#net.isr.numthreads=4
#net.isr.maxthreads=4
#net.isr.bindthreads=1
#net.isr.dispatch=deferred

#hw.igb.max_interrupt_rate=64000
#net.inet.tcp.tcbhashsize=32000

#kern.ipc.nmbjumbo9=1280000
#kern.ipc.nmbjumbo16=1280000

#kern.ipc.nmbclusters=2000000
#kern.ipc.nmbjumbop=128000

after going to 11.2 the system will freeze and become totally unusable 
after a few hours of boot if I enable these values....??

Another thing I noticed in the dmesg a few times was this error:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219866

"ctl_datamove: tag 0x174e97 on (28:34:0) aborted"

It has been reported though I am not sure if a fix has been pushed yet?

My setup is as follows:

routed network (LAN) -> lagg0 (lacp 4gb) -> server -> lagg1 (lacp 2gb) 
-> iscsi target 1

                                                   |-> iscsi target 2

The NIC's are Intel based using igb kernel driver:

igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>

The iscsi network is running on a separate isolated vlan which is not 
routed.

Can anyone suggest anything to stop my system from completely locking up 
and becoming unresponsive?

At the moment I'm not sure if switching to 'Stable' or 'Current' 
branches is a good solution?

Regards,

Kaya