Date: Sun, 16 May 2021 18:30:53 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255930 Bug ID: 255930 Summary: ocs_fc Lost all connected devices after some use. Product: Base System Version: Unspecified Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: arne@Steinkamm.COM Created attachment 225001 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=225001&action=edit Message file with all described problems. See Bug reports for time stamps I connected a HP Proliant 380 Gen9 server with emulex fc HBAs to two simple fc setups and attached a NetApp FlashFiler EF550 unit. To get the most out of ZFS I assigned all 24 flash modules without using the EF550 RAID features to the proliant. I use geom_multipath to handle the redundant connections to the flash filer and made a ZFS Pool with 3 x 7-disk raidz-1, one spare, one log and one cache disks. The read/write speed is good (2.5 GB/s according to zpool iostat) but after minutes of heavy use I got kernel: ocs_fc0: ocs_initiator_io: device LOST 0 messages and all fc connected disks are gone. I found no way to recover out of this error situation other than reboot, panic (zfs is not happy about the situation) or hardware reset. Further obervations: - reported topologies and link speeds are correct. - ef550 replaced with identical spare unit: no change - changed fc ports: no effect - used different emulex cards (alone, mixed): no effect, problem happens with any combination of installed emulex cards - tried qlogic cards (driver: isp(4)): No problems, works 100% stable but slightly slower io performance. - tried 12.1-RELEASE, 12.2-RELEASE and 13.0-RELEASE. Last one with generic kernel without any changes. Every time lost all fc devices. - Boot with disabled switch fc ports: After portenable of the brokades' ports the fc links went up, no automatic attachment of the disks. A camcontrol rescan all was not successfull, thousands of "device not ready" messages flooded the console. The only way to get the flash modules online is to boot the server with working fc setup. - Bumping the emulex cards to the newest available firmware had no visible effect. - Playing with the HBA related BIOS settings "HP Shared Memory Feature", "Brocade FA-PWWN" and "PLOGT Retry Timer" had no visible effect. More details of the last try with 13.0-RELEASE generic: uname -a: FreeBSD vwcnctd00fs003.dev.kpdm01.group.vwg 13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f: Fri Apr 9 04:24:09 UTC 2021 firstname.lastname@example.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 pciconf -lv: ocs_fc0@pci0:8:0:0: class=0x0c0400 rev=0x01 hdr=0x00 vendor=0x10df device=0xe300 subvendor=0x1590 subdevice=0x0214 vendor = 'Emulex Corporation' device = 'LPe31000/LPe32000 Series 16Gb/32Gb Fibre Channel Adapter' class = serial bus subclass = Fibre Channel ocs_fc1@pci0:8:0:1: class=0x0c0400 rev=0x01 hdr=0x00 vendor=0x10df device=0xe300 subvendor=0x1590 subdevice=0x0214 vendor = 'Emulex Corporation' device = 'LPe31000/LPe32000 Series 16Gb/32Gb Fibre Channel Adapter' class = serial bus subclass = Fibre Channel ocs_fc2@pci0:129:0:0: class=0x0c0400 rev=0x30 hdr=0x00 vendor=0x10df device=0xe200 subvendor=0x103c subdevice=0x197f vendor = 'Emulex Corporation' device = 'LPe15000/LPe16000 Series 8Gb/16Gb Fibre Channel Adapter' class = serial bus subclass = Fibre Channel ocs_fc3@pci0:129:0:1: class=0x0c0400 rev=0x30 hdr=0x00 vendor=0x10df device=0xe200 subvendor=0x103c subdevice=0x197f vendor = 'Emulex Corporation' device = 'LPe15000/LPe16000 Series 8Gb/16Gb Fibre Channel Adapter' class = serial bus subclass = Fibre Channel HP device names: HPE SN1200E 16Gb 2p FC HBA Product Part Number: Q0L14-63001 Assembly Number 870002-001 HP SN1100E 16Gb 2P FC HBA Product Part Number: C8R39-60001 Assembly Number: 719212-001 The EF550 has two independent controllers both connected to all flash module bays. Each controller has two FC ports. This ports are connected to two independent brocade fc switches (no interlink fibre). One port of each emulex card is connected to one of the fc switches. The other port of each emulex card is not in use (connected to an enterprise fabric network independent from my laborotry setup, but ports are disabled on the switch site). Using only on of the emulex cards does not change the effect. I tryed all permutations possible. To get valid data for this bug report I installed 13.0-release with minimal setup: /boot/device.hints: hint.ocs_fc.0.initiator="1" hint.ocs_fc.2.initiator="1" hint.ocs_fc.0.topology="1" hint.ocs_fc.2.topology="1" hint.ocs_fc.0.speed="16000" hint.ocs_fc.2.speed="16000" /etc/sysctl.conf: dev.ocs_fc.1.port_state=offline dev.ocs_fc.3.port_state=offline In the attached messages File you will find this: May 15 19:21:43 - 19:29:22 First boot and configuring network connectivity on the shell. May 15 19:44:24 Enabling FC ports on both brocades May 15 19:47:21 camcontrol rescan all (all rescans successful according to camcontrol) May 15 19:59:15 reboot --- Now with enabled FC links. It will find the flash modules May 15 20:06:36 kldload geom_multipath.ko geom_multipath finds four preconfigured links to each flash module. This is correct. No I did a zpool import zone and startet a couple of test tools Output of zpool iostat zone 1: capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- zone 14.7T 486G 40.1K 0 2.61G 0 zone 14.7T 486G 39.0K 436 2.58G 1.94M zone 14.7T 486G 41.6K 0 2.60G 0 zone 14.7T 486G 39.4K 0 2.60G 0 zone 14.7T 486G 39.4K 0 2.62G 0 zone 14.7T 486G 40.7K 0 2.57G 0 zone 14.7T 486G 39.9K 420 2.54G 1.94M zone 14.7T 486G 39.5K 0 2.58G 0 zone 14.7T 486G 39.6K 0 2.64G 0 zone 14.7T 486G 39.3K 0 2.57G 0 zone 14.7T 486G 39.4K 0 2.62G 0 ... May 15 20:15:15 The problem starts May 15 20:16:18 attempt of a camcontrol rescan with no success My short term solution is to use QLogic cards with the isp driver which works without any changes necessary 100% stable. -- You are receiving this mail because: You are the assignee for the bug.