iSCSI issues after upgrading to 11.2 x64 RELEASE
Kaya Saman
kayasaman at gmail.com
Mon Sep 3 21:06:54 UTC 2018
Hi,
I'm not sure what's going on but after running an upgrade from 11.1 to
11.2 I'm getting many iscsi errors. It actually gets so severe that it
freezes the system.
Dmesg is flooded with lines like this:
(da25:iscsi2:0:0:1): WRITE(10). CDB: 2a 00 28 76 02 50 00 00 08 00
(da25:iscsi2:0:0:1): CAM status: SCSI Status Error
(da25:iscsi2:0:0:1): SCSI status: Check Condition
(da25:iscsi2:0:0:1): SCSI sense: UNIT ATTENTION asc:29,7 (I_T nexus loss
occurred)
(da25:iscsi2:0:0:1): Retrying command (per sense data)
(da27:iscsi1:0:0:0): READ(16). CDB: 88 00 00 00 00 04 41 df 82 78 00 00
01 00 00 00
(da27:iscsi1:0:0:0): CAM status: SCSI Status Error
(da27:iscsi1:0:0:0): SCSI status: Check Condition
(da27:iscsi1:0:0:0): SCSI sense: UNIT ATTENTION asc:29,7 (I_T nexus loss
occurred)
(da27:iscsi1:0:0:0): Retrying command (per sense data)
The upgrade actually turned into a fresh install since my zroot got
corrupted after one of my mirrored disks died and I replaced the
remaining original with a new one. The system originally was on 11.1 so
the 'clean' reinstall went to 11.2.
There is also quite high cpu interrupt I'm noticing too at a steady 25%
though can go up to 75% with system load averages getting to above 40
from 'top' output.
A little bit of digging around I came across this:
https://lists.freebsd.org/pipermail/freebsd-net/2017-June/048293.html
however, as mentioned below the 'loader' variables cause a complete
system lockup.
In 11.1 I had these variables set in /boot/loader.conf:
#net.isr.numthreads=4
#net.isr.maxthreads=4
#net.isr.bindthreads=1
#net.isr.dispatch=deferred
#hw.igb.max_interrupt_rate=64000
#net.inet.tcp.tcbhashsize=32000
#kern.ipc.nmbjumbo9=1280000
#kern.ipc.nmbjumbo16=1280000
#kern.ipc.nmbclusters=2000000
#kern.ipc.nmbjumbop=128000
after going to 11.2 the system will freeze and become totally unusable
after a few hours of boot if I enable these values....??
Another thing I noticed in the dmesg a few times was this error:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219866
"ctl_datamove: tag 0x174e97 on (28:34:0) aborted"
It has been reported though I am not sure if a fix has been pushed yet?
My setup is as follows:
routed network (LAN) -> lagg0 (lacp 4gb) -> server -> lagg1 (lacp 2gb)
-> iscsi target 1
|-> iscsi target 2
The NIC's are Intel based using igb kernel driver:
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
The iscsi network is running on a separate isolated vlan which is not
routed.
Can anyone suggest anything to stop my system from completely locking up
and becoming unresponsive?
At the moment I'm not sure if switching to 'Stable' or 'Current'
branches is a good solution?
Regards,
Kaya
More information about the freebsd-net
mailing list