From bugmaster at FreeBSD.org Mon Aug 4 11:07:02 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 4 11:08:47 2008 Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org Message-ID: <200808041107.m74B716V082192@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/40895 scsi wierd kernel / device driver bug o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/60598 scsi wire down of scsi devices conflicts with config o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi 15 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce o kern/38828 scsi [dpt] [request] DPT PM2012B/90 doesn't work o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/120487 scsi [sg] scsi_sg incompatible with scanners o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/123666 scsi [aac] attach fails with Adaptec SAS RAID 3805 controll o kern/123674 scsi [ahc] ahc driver dumping 10 problems total. From ragtop63 at hotmail.com Wed Aug 6 22:18:23 2008 From: ragtop63 at hotmail.com (Rag `) Date: Wed Aug 6 22:18:29 2008 Subject: A tiny amr(4) monitoring tool Message-ID: Hello, I saw this posted on the FreeBSD lists. I was wondering if it's still available somewhere as the link in the post does not work. Also, can it be compiled for FreeBSD 6.x & 7.x? Thanks, -David From jkim at FreeBSD.org Wed Aug 6 22:33:35 2008 From: jkim at FreeBSD.org (Jung-uk Kim) Date: Wed Aug 6 22:33:41 2008 Subject: A tiny amr(4) monitoring tool In-Reply-To: References: Message-ID: <200808061833.05523.jkim@FreeBSD.org> On Wednesday 06 August 2008 06:06 pm, Rag ` wrote: > Hello, > I saw this posted on the FreeBSD lists. I was wondering if it's > still available somewhere as the link in the post does not work. > Also, can it be compiled for FreeBSD 6.x & 7.x? It is in the ports tree, i.e., sysutils/amrstat. http://www.freshports.org/sysutils/amrstat Jung-uk Kim From jkim at FreeBSD.org Wed Aug 6 23:27:02 2008 From: jkim at FreeBSD.org (Jung-uk Kim) Date: Wed Aug 6 23:27:09 2008 Subject: [RFC] SCSI opcode and ASC update Message-ID: <200808061926.55111.jkim@FreeBSD.org> I found that we have very very old opcodes and ASC numbers in sys/cam/scsi/scsi_all.c. ken@ touched few ASCs five years ago but they are pretty much the same since the beginning of the file (almost ten years now). The latest op-num.txt and asc-num.txt are available from here: http://www.t10.org/lists/op-num.txt http://www.t10.org/lists/asc-num.txt I made a patch to merge the changes from these files: http://people.freebsd.org/~jkim/scsi_all.diff Now the problem is there are almost 300 new ASCs and I am not sure what to do with them. For now, they do SS_RDEF and all are marked with 'XXX TBD' for now. Is there anyone interested in setting them correctly? Don't we want to separate them into header files, e.g., scsi_opcode.h and scsi_asc.h? Do we really care or did I just waste my time? Thanks, Jung-uk Kim From scottl at samsco.org Thu Aug 7 00:21:42 2008 From: scottl at samsco.org (Scott Long) Date: Thu Aug 7 00:21:48 2008 Subject: [RFC] SCSI opcode and ASC update In-Reply-To: <200808061926.55111.jkim@FreeBSD.org> References: <200808061926.55111.jkim@FreeBSD.org> Message-ID: <489A4012.3030509@samsco.org> Jung-uk Kim wrote: > I found that we have very very old opcodes and ASC numbers in > sys/cam/scsi/scsi_all.c. ken@ touched few ASCs five years ago but > they are pretty much the same since the beginning of the file (almost > ten years now). The latest op-num.txt and asc-num.txt are available > from here: > > http://www.t10.org/lists/op-num.txt > http://www.t10.org/lists/asc-num.txt > > I made a patch to merge the changes from these files: > > http://people.freebsd.org/~jkim/scsi_all.diff > > Now the problem is there are almost 300 new ASCs and I am not sure > what to do with them. For now, they do SS_RDEF and all are marked > with 'XXX TBD' for now. Is there anyone interested in setting them > correctly? Don't we want to separate them into header files, e.g., > scsi_opcode.h and scsi_asc.h? Do we really care or did I just waste > my time? > Wow, nice work. I think it's fine to commit as-is with the XXX markers for the new SS_RDEF entries. Over time we can refine those as needed. Scott From ken at kdm.org Thu Aug 7 16:00:07 2008 From: ken at kdm.org (Kenneth D. Merry) Date: Thu Aug 7 16:00:17 2008 Subject: [RFC] SCSI opcode and ASC update In-Reply-To: <200808061926.55111.jkim@FreeBSD.org> References: <200808061926.55111.jkim@FreeBSD.org> Message-ID: <20080807152118.GA99233@nargothrond.kdm.org> On Wed, Aug 06, 2008 at 19:26:53 -0400, Jung-uk Kim wrote: > I found that we have very very old opcodes and ASC numbers in > sys/cam/scsi/scsi_all.c. ken@ touched few ASCs five years ago but > they are pretty much the same since the beginning of the file (almost > ten years now). The latest op-num.txt and asc-num.txt are available > from here: > > http://www.t10.org/lists/op-num.txt > http://www.t10.org/lists/asc-num.txt > > I made a patch to merge the changes from these files: > > http://people.freebsd.org/~jkim/scsi_all.diff Good work! Thanks for doing that! > Now the problem is there are almost 300 new ASCs and I am not sure > what to do with them. For now, they do SS_RDEF and all are marked > with 'XXX TBD' for now. Is there anyone interested in setting them > correctly? Don't we want to separate them into header files, e.g., > scsi_opcode.h and scsi_asc.h? Do we really care or did I just waste > my time? In general, SS_RDEF (i.e. retry) should work fine for most things. I think most of the ASC/ASCQ combinations that need special error recovery actions already have them. We can change some of the new ones to specific error recovery actions (e.g. SS_TUR) when we run into specific cases that show a need for it. (There is also SS_FATAL, which would mainly prevent additional retries for commands that won't work if retried.) I think it makes more sense to keep the opcodes and ASC/ASCQs in a .c file, since they are structures with values in them, and not just structure definitions. I think it's fine to keep them in scsi_all.c, but if you would like to separate them out into separate .c files, feel free to send a patch along for review. This patch looks good, though. Ken -- Kenneth Merry ken@kdm.org From sbruno at miralink.com Fri Aug 8 03:34:12 2008 From: sbruno at miralink.com (Sean Bruno) Date: Fri Aug 8 03:34:19 2008 Subject: [mpt] Panic with LSI SAS controller Message-ID: <489BBEB2.9010408@miralink.com> I have at my disposal, several IBM x3250's with the new and quite fancy SAS drives. Unfortuneatly, I get this panic with RELENG_6 and CAM_NEW_TRAN_CODE defined. I _assume_ that RELENG_7 would have the same failure. pcib4: irq 17 at device 28.5 on pci0 pcib4: secondary bus 1 pcib4: subordinate bus 1 pcib4: I/O decode 0x3000-0x3fff pcib4: memory decode 0xe8400000-0xe84fffff pcib4: prefetched decode 0xfff00000-0xfffff pci1: on pcib4 pci1: physical bus=1 found-> vendor=0x1000, dev=0x0056, revid=0x02 bus=1, slot=0, func=0 class=01-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0147, statreg=0x0010, cachelnsz=8 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=10 powerspec 2 supports D0 D1 D2 D3 current D0 MSI supports 1 message, 64 bit MSI-X supports 1 message in map 0x14 map[10]: type 4, range 32, base 00003000, size 8, enabled pcib4: requested I/O range 0x3000-0x30ff: in range map[14]: type 1, range 64, base e8410000, size 14, enabled pcib4: requested memory range 0xe8410000-0xe8413fff: good map[1c]: type 1, range 64, base e8400000, size 16, enabled pcib4: requested memory range 0xe8400000-0xe840ffff: good pcib4: matched entry for 1.0.INTA pcib4: slot 0 INTA hardwired to IRQ 17 mpt0: port 0x3000-0x30ff mem 0xe8410000-0xe8413fff,0xe8400000-0xe840ffff irq 17 at device 0.0 on pci1 mpt0: Reserved 0x100 bytes for rid 0x10 type 4 at 0x3000 mpt0: Reserved 0x4000 bytes for rid 0x14 type 3 at 0xe8410000 mpt0: [GIANT-LOCKED] mpt0: MPI Version=1.5.16.0 panic: recursive lock acquire at ../../../dev/mpt/mpt_user.c:111 KDB: enter: panic [thread pid 0 tid 0 ] Stopped at kdb_enter+0x2b: nop db> trace Tracing pid 0 tid 0 td 0xc0922be0 kdb_enter(c0899514) at kdb_enter+0x2b panic(c0888766,c088cd9d,6f,c08eb1a0,2,...) at panic+0xbb mpt_user_attach(c6592000) at mpt_user_attach+0x1f mpt_attach(c6592000) at mpt_attach+0x5a mpt_pci_attach(c6557680) at mpt_pci_attach+0x6ad device_attach(c6557680,c0c20a08,c6557680,c6546e80,c6546e80,...) at device_attach+0x58 device_probe_and_attach(c6557680) at device_probe_and_attach+0xe0 bus_generic_attach(c6546e80,6,c6496128,1,c04ab158,...) at bus_generic_attach+0x16 acpi_pci_attach(c6546e80) at acpi_pci_attach+0xec device_attach(c6546e80,c6556dc8,c6546e80,0,c655c280,...) at device_attach+0x58 device_probe_and_attach(c6546e80) at device_probe_and_attach+0xe0 bus_generic_attach(c655c280,c65628b0,c04acd04,c655c280,c088de41,...) at bus_generic_attach+0x16 acpi_pcib_attach(c655c280,c65628b0,1,c655c280,c6496128,...) at acpi_pcib_attach+0x1a8 acpi_pcib_pci_attach(c655c280) at acpi_pcib_pci_attach+0xac device_attach(c655c280,c0c20b38,c655c280,c655c500,c655c500,...) at device_attach+0x58 device_probe_and_attach(c655c280) at device_probe_and_attach+0xe0 bus_generic_attach(c655c500,6,c64922e8,1,c04ab158,...) at bus_generic_attach+0x16 acpi_pci_attach(c655c500) at acpi_pci_attach+0xec device_attach(c655c500,c648f568,c655c500,0,c6491100,...) at device_attach+0x58 device_probe_and_attach(c655c500) at device_probe_and_attach+0xe0 bus_generic_attach(c6491100,c654b234,c04acd04,c6491100,c0c20be4,...) at bus_generic_attach+0x16 acpi_pcib_attach(c6491100,c654b234,0,c08c8840,c08f25cc,...) at acpi_pcib_attach+0x1a8 acpi_pcib_acpi_attach(c6491100) at acpi_pcib_acpi_attach+0x25e device_attach(c6491100,c6491380,c6491100,c0c20c94,0,...) at device_attach+0x58 device_probe_and_attach(c6491100) at device_probe_and_attach+0xe0 bus_generic_attach(c64b4d80,40000,c07850b2,c0871321,c0c20cd8,...) at bus_generic_attach+0x16 acpi_probe_children(c64b4d80) at acpi_probe_children+0x99 acpi_attach(c64b4d80) at acpi_attach+0x598 device_attach(c64b4d80,0,c64b4d80,c64ba500,0,...) at device_attach+0x58 device_probe_and_attach(c64b4d80) at device_probe_and_attach+0xe0 bus_generic_attach(c64ba500,c64ba500,c64ba500,c0c20d40,c061c91c,...) at bus_generic_attach+0x16 nexus_attach(c64ba500) at nexus_attach+0x13 device_attach(c64ba500,c05fe6da,c64ba500,c090a430,c25000,...) at device_attach+0x58 device_probe_and_attach(c64ba500) at device_probe_and_attach+0xe0 root_bus_configure(c0c20d88,c05dce52,0,c1ec00,c1e000,...) at root_bus_configure+0x16 configure(0,c1ec00,c1e000,0,c0449ef5,...) at configure+0x9 mi_startup() at mi_startup+0x96 begin() at begin+0x2c -- Sean Bruno MiraLink Corporation 6015 NE 80th Ave, Ste 100 Portland, OR 97218 Cell 503-358-6832 Phone 503-621-5143 Fax 503-621-5199 MSN: sbruno@miralink.com Google: seanwbruno@gmail.com From bugmaster at FreeBSD.org Mon Aug 11 11:07:04 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 11 11:08:45 2008 Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org Message-ID: <200808111107.m7BB73HO047319@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/40895 scsi wierd kernel / device driver bug o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/60598 scsi wire down of scsi devices conflicts with config o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi 15 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce o kern/38828 scsi [dpt] [request] DPT PM2012B/90 doesn't work o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/120487 scsi [sg] scsi_sg incompatible with scanners o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/123666 scsi [aac] attach fails with Adaptec SAS RAID 3805 controll o kern/123674 scsi [ahc] ahc driver dumping 10 problems total. From Carole.Macheret at ch.meggitt.com Thu Aug 14 15:25:16 2008 From: Carole.Macheret at ch.meggitt.com (Carole Macheret) Date: Thu Aug 14 15:25:23 2008 Subject: g_vfs_done References: <4874F53A0200001300130DE3@gw.vibro-meter.com> <48A465B10200001300132295@gw.vibro-meter.com> Message-ID: <48A46586.1F16.0013.0@ch.meggitt.com> Hello, We are using FreeBSD 7.0-RELEASE #1 running Squid and Zabbix on vmware ESX 3.0.2 and our vmware ESX servers access our SAN through IpStor cluster (Storage virtualization and mirroring). We have 2 storages (EVA 6100) and the IpStor solution allows us to mirror disks on both EVAs. We have a problem with both the Zabbix and Squid FreeBSD virtual machines, when the virtual machine is loosing its disks (EVA controller reboot or ipstor cluster failover), we have several "g_vfs_done() : da1s1d[WRITE(offset=2312431234, length=12453)] error= 5" errors then the host is definitively frozen. The disk loss lasts 1-5 seconds. Windows virtual machines do freeze during the loss then continue working. On Windows we had to specify a longer timeout for local disk in registry. Does anybody has an idea what could be tuned to avoid this problem ? Attached you can find the dmesg and a screenshot of the g_vfs_done error... Thanks in advance for your help Best regards Carole Carole Macheret System and Network Administrator Vibro-Meter SA Switzerland Phone: +41264071591 Email : Carole.Macheret@ch.meggitt.com This e-mail may contain confidential information and/or copyright material. This e-mail is intended for the use of the addressee only. Any unauthorized use may be unlawful. If you receive this e-mail by mistake, please advise the sender immediately by using the reply facility in your e-mail software. Information contained in and/or attached to this document may be subject to Export Control Regulations of the European Community, USA or other countries. Each recipient of this document is responsible to ensure that usage and/or transfer of any information contained in this document complies with all relevant Export Control regulations. If you are in any doubt about the Export Control restrictions that apply to this information, please contact the sender immediately. You should be aware that the contents of this e-mail may be monitored to ensure compliance with the Meggitt IT User policy. -------------- next part -------------- Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-RELEASE #1: Wed Mar 26 12:28:11 CET 2008 roland@squidproxy.vm.lan:/usr/obj/usr/src/sys/SQUID Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Dual-Core AMD Opteron(tm) Processor 2218 (2600.07-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x40f13 Stepping = 3 Features=0x78bfbff Features2=0x2001 AMD Features=0xea500800 AMD Features2=0x1 real memory = 1073741824 (1024 MB) avail memory = 1037078528 (989 MB) kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Mar 26 2008 12:27:54) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: on acpi0 powernow0: on cpu0 device_attach: powernow0 attach returned 6 acpi_throttle0: on cpu0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 1.0 on pci0 pci1: on pcib1 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1050-0x105f at device 7.1 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] pci0: at device 7.3 (no driver attached) vgapci0: port 0x1060-0x106f mem 0xf8000000-0xfbffffff,0xf4000000-0xf47fffff at device 15.0 on pci0 mpt0: port 0x1080-0x10ff mem 0xf4860000-0xf4860fff irq 9 at device 16.0 on pci0 mpt0: [ITHREAD] mpt0: MPI Version=1.2.0.0 em0: port 0x1070-0x1077 mem 0xf4820000-0xf483ffff,0xf4800000-0xf480ffff irq 11 at device 17.0 on pci0 em0: Memory Access and/or Bus Master bits were not set! em0: Ethernet address: 00:50:56:81:20:59 em0: [FILTER] em1: port 0x1078-0x107f mem 0xf4840000-0xf485ffff,0xf4810000-0xf481ffff irq 10 at device 18.0 on pci0 em1: Memory Access and/or Bus Master bits were not set! em1: Ethernet address: 00:50:56:81:38:ac em1: [FILTER] acpi_acad0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model IntelliMouse, device ID 3 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xca000-0xcafff,0xcb000-0xcbfff,0xdc000-0xdffff,0xe0000-0xe3fff pnpid ORM0000 on isa0 ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 ppbus0: [ITHREAD] plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 2600070771 Hz quality 800 Timecounters tick every 10.000 msec hptrr: no controller detected. Waiting 5 seconds for SCSI devices to settle acd0: CDROM at ata0-master UDMA33 da0 at mpt0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-2 device da0: 3.300MB/s transfers da0: Command Queueing Enabled da0: 5632MB (11534336 512 byte sectors: 255H 63S/T 717C) da1 at mpt0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-2 device da1: 3.300MB/s transfers da1: Command Queueing Enabled da1: 20480MB (41943040 512 byte sectors: 255H 63S/T 2610C) Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted WARNING: /tmp was not properly dismounted WARNING: /usr was not properly dismounted WARNING: /var was not properly dismounted /var: optimization changed from SPACE to TIME From scottl at samsco.org Thu Aug 14 17:09:27 2008 From: scottl at samsco.org (Scott Long) Date: Thu Aug 14 17:09:34 2008 Subject: g_vfs_done In-Reply-To: <48A46586.1F16.0013.0@ch.meggitt.com> References: <4874F53A0200001300130DE3@gw.vibro-meter.com> <48A465B10200001300132295@gw.vibro-meter.com> <48A46586.1F16.0013.0@ch.meggitt.com> Message-ID: <48A4666C.6080008@samsco.org> Carole Macheret wrote: > Hello, > > We are using FreeBSD 7.0-RELEASE #1 running Squid and Zabbix on vmware ESX 3.0.2 and our vmware ESX servers access our SAN through IpStor cluster (Storage virtualization and mirroring). > > We have 2 storages (EVA 6100) and the IpStor solution allows us to mirror disks on both EVAs. > > We have a problem with both the Zabbix and Squid FreeBSD virtual machines, when the virtual machine is loosing its disks (EVA controller reboot or ipstor cluster failover), we have several "g_vfs_done() : da1s1d[WRITE(offset=2312431234, length=12453)] error= 5" errors then the host is definitively frozen. The disk loss lasts 1-5 seconds. Windows virtual machines do freeze during the loss then continue working. On Windows we had to specify a longer timeout for local disk in registry. > > Does anybody has an idea what could be tuned to avoid this problem ? > > Attached you can find the dmesg and a screenshot of the g_vfs_done error... > > Thanks in advance for your help > So the virtual disks that the FreeBSD images are using in VMWare are on an IpStor, and those periodically go away, yes? What's probably happening is that the VMWare host is triggering an event in the FreeBSD client VM that essentially is making the virtual disks go away. Inside the FreeBSD VM, the SCSI layer tries to talk to the disk and gets a selection timeout since the disk is no longer there. It doesn't know that this is a temporary state, and it declares the I/O as failed. At that point, the BSD VM gets upset and everything gets bad. There is a property called kern.cam.da.default_timeout. It's set to 60 seconds, but I don't think that it will help you in this case, since it's likely that the i/o is failing because of a selection timeout, not because the virtual disk is slow in completing the i/o. The kern.cam.da.retry_count property is set to 5, and changing it might help since it might be able to force enough retries to give time for the virtual disk to come back. Try the following command on a running system: sysctl kern.cam.da.retry_count=100 This will allow for about 25 seconds worth of retries (a selection attempt takes 250ms, so you'll get about 4 retries per second). If this doesn't work, try configuring VMWare to give you a serial console that you can capture on the host, then set bootverbose during boot and send me the log once the problem happens. Scott From westr at connection.ca Fri Aug 15 16:30:19 2008 From: westr at connection.ca (Ross) Date: Fri Aug 15 16:30:48 2008 Subject: isp(4) - setting debug mode flags Message-ID: <1051060505.20080815123014@connection.ca> Has anyone successfully set the debug mode flags in the isp(4) drivers on boot for Freebsd 7.0? The man page states that setting "hint.isp.0.debug" should do the trick, but setting it to any value doesn't cause anything to happen. Checking the code shows isp_pci.c is doing a getenv_int("isp_debug", &bitmap), which looks like setting that kernel variable should do it, but still no go. So I'm wondering if anyone else got it working in the past, and if so, what did you set for maximum debugging? (Reason is that I'm trying to track down a bug that causes the kernel to crash occasionally on boot, in a boot-from-san configuration.) Cheers, Ross. -- From scottl at samsco.org Fri Aug 15 18:43:46 2008 From: scottl at samsco.org (Scott Long) Date: Fri Aug 15 18:43:52 2008 Subject: isp(4) - setting debug mode flags In-Reply-To: <1051060505.20080815123014@connection.ca> References: <1051060505.20080815123014@connection.ca> Message-ID: <48A5CE50.4050404@samsco.org> Ross wrote: > Has anyone successfully set the debug mode flags in the isp(4) drivers > on boot for Freebsd 7.0? > > The man page states that setting "hint.isp.0.debug" should do the > trick, but setting it to any value doesn't cause anything to happen. > > Checking the code shows isp_pci.c is doing a getenv_int("isp_debug", > &bitmap), which looks like setting that kernel variable should do it, > but still no go. > > So I'm wondering if anyone else got it working in the past, and if so, > what did you set for maximum debugging? > > (Reason is that I'm trying to track down a bug that causes the kernel > to crash occasionally on boot, in a boot-from-san configuration.) > > Cheers, > Ross. > First thing to check is whether what you're setting is actually making it into the kernel environment and is what you expect it to be. Run 'kenv' on a running system and look for the hint.isp.0.debug string. Second, it looks like ISP_LOGDEBUG1 is the only flag that has any special meaning, make sure you set that as well as ISP_LOGWARN and ISP_LOGERR (the default flags). Scott From westr at connection.ca Fri Aug 15 18:53:41 2008 From: westr at connection.ca (Ross) Date: Fri Aug 15 18:53:47 2008 Subject: isp(4) - setting debug mode flags In-Reply-To: <48A5CE50.4050404@samsco.org> References: <1051060505.20080815123014@connection.ca> <48A5CE50.4050404@samsco.org> Message-ID: <1911451167.20080815145339@connection.ca> SL> First thing to check is whether what you're setting is actually making SL> it into the kernel environment and is what you expect it to be. Run SL> 'kenv' on a running system and look for the hint.isp.0.debug string. SL> Second, it looks like ISP_LOGDEBUG1 is the only flag that has any SL> special meaning, make sure you set that as well as ISP_LOGWARN and SL> ISP_LOGERR (the default flags). Yes, it's getting there (kenv is correctly displaying it). I'm currently attempting to use OxFFF as the value so that everything gets activated, but still nothing. Of course, knowing my luck, it's probably being activated, just nothing is being displayed. :-) Cheers, Ross. -- From Carole.Macheret at ch.meggitt.com Mon Aug 18 07:45:00 2008 From: Carole.Macheret at ch.meggitt.com (Carole Macheret) Date: Mon Aug 18 07:45:06 2008 Subject: g_vfs_done In-Reply-To: <48A4666C.6080008@samsco.org> References: <4874F53A0200001300130DE3@gw.vibro-meter.com> <48A465B10200001300132295@gw.vibro-meter.com> <48A46586.1F16.0013.0@ch.meggitt.com><48A46586.1F16.0013.0@ch.meggitt.com> <48A4666C.6080008@samsco.org> Message-ID: <48A9445E.1F16.0013.0@ch.meggitt.com> Thanks for your answer. We will do some tests as soon as we will have the opportunity since the system is productive! Carole >>> Scott Long 14.08.2008 19:07 >>> Carole Macheret wrote: > Hello, > > We are using FreeBSD 7.0-RELEASE #1 running Squid and Zabbix on vmware ESX 3.0.2 and our vmware ESX servers access our SAN through IpStor cluster (Storage virtualization and mirroring). > > We have 2 storages (EVA 6100) and the IpStor solution allows us to mirror disks on both EVAs. > > We have a problem with both the Zabbix and Squid FreeBSD virtual machines, when the virtual machine is loosing its disks (EVA controller reboot or ipstor cluster failover), we have several "g_vfs_done() : da1s1d[WRITE(offset=2312431234, length=12453)] error= 5" errors then the host is definitively frozen. The disk loss lasts 1-5 seconds. Windows virtual machines do freeze during the loss then continue working. On Windows we had to specify a longer timeout for local disk in registry. > > Does anybody has an idea what could be tuned to avoid this problem ? > > Attached you can find the dmesg and a screenshot of the g_vfs_done error... > > Thanks in advance for your help > So the virtual disks that the FreeBSD images are using in VMWare are on an IpStor, and those periodically go away, yes? What's probably happening is that the VMWare host is triggering an event in the FreeBSD client VM that essentially is making the virtual disks go away. Inside the FreeBSD VM, the SCSI layer tries to talk to the disk and gets a selection timeout since the disk is no longer there. It doesn't know that this is a temporary state, and it declares the I/O as failed. At that point, the BSD VM gets upset and everything gets bad. There is a property called kern.cam.da.default_timeout. It's set to 60 seconds, but I don't think that it will help you in this case, since it's likely that the i/o is failing because of a selection timeout, not because the virtual disk is slow in completing the i/o. The kern.cam.da.retry_count property is set to 5, and changing it might help since it might be able to force enough retries to give time for the virtual disk to come back. Try the following command on a running system: sysctl kern.cam.da.retry_count=100 This will allow for about 25 seconds worth of retries (a selection attempt takes 250ms, so you'll get about 4 retries per second). If this doesn't work, try configuring VMWare to give you a serial console that you can capture on the host, then set bootverbose during boot and send me the log once the problem happens. Scott From bugmaster at FreeBSD.org Mon Aug 18 11:06:57 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 18 11:08:40 2008 Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org Message-ID: <200808181106.m7IB6u4p079935@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/40895 scsi wierd kernel / device driver bug o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/60598 scsi wire down of scsi devices conflicts with config o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi 15 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce o kern/38828 scsi [dpt] [request] DPT PM2012B/90 doesn't work o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/120487 scsi [sg] scsi_sg incompatible with scanners o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/123666 scsi [aac] attach fails with Adaptec SAS RAID 3805 controll o kern/123674 scsi [ahc] ahc driver dumping 10 problems total. From sbruno at miralink.com Wed Aug 20 22:14:36 2008 From: sbruno at miralink.com (Sean Bruno) Date: Wed Aug 20 22:14:42 2008 Subject: RELENG_6 patch for MPT Message-ID: <48AC974B.2020203@miralink.com> Hmmm...RELENG_6 looks like it has a locking issue at this time. The patch below removes the panic due to dead lock, but I'm not sure how safe it is. Index: /trunk/src/ankeny/src/FreeBSD_RELENG6/sys/dev/mpt/mpt_user.c =================================================================== --- /trunk/src/ankeny/src/FreeBSD_RELENG6/sys/dev/mpt/mpt_user.c (revision 5657) +++ /trunk/src/ankeny/src/FreeBSD_RELENG6/sys/dev/mpt/mpt_user.c (revision 5761) @@ -106,15 +106,13 @@ mpt_user_attach(struct mpt_softc *mpt) { mpt_handler_t handler; int error, unit; - MPT_LOCK(mpt); handler.reply_handler = mpt_user_reply_handler; error = mpt_register_handler(mpt, MPT_HANDLER_REPLY, handler, &user_handler_id); - MPT_UNLOCK(mpt); if (error != 0) { mpt_prt(mpt, "Unable to register user handler!\n"); return (error); } unit = device_get_unit(mpt->dev); -- Sean Bruno MiraLink Corporation 6015 NE 80th Ave, Ste 100 Portland, OR 97218 Phone 503-621-5143 Fax 503-621-5199 MSN: sbruno@miralink.com Google: seanwbruno@gmail.com Yahoo: sean_bruno@yahoo.com From scottl at samsco.org Thu Aug 21 00:23:57 2008 From: scottl at samsco.org (Scott Long) Date: Thu Aug 21 00:24:05 2008 Subject: RELENG_6 patch for MPT In-Reply-To: <48AC974B.2020203@miralink.com> References: <48AC974B.2020203@miralink.com> Message-ID: <48ACB599.8040603@samsco.org> Sean Bruno wrote: > Hmmm...RELENG_6 looks like it has a locking issue at this time. > > The patch below removes the panic due to dead lock, but I'm not sure > how safe it is. > > > Index: /trunk/src/ankeny/src/FreeBSD_RELENG6/sys/dev/mpt/mpt_user.c > =================================================================== > --- /trunk/src/ankeny/src/FreeBSD_RELENG6/sys/dev/mpt/mpt_user.c > (revision 5657) > +++ /trunk/src/ankeny/src/FreeBSD_RELENG6/sys/dev/mpt/mpt_user.c > (revision 5761) > @@ -106,15 +106,13 @@ > mpt_user_attach(struct mpt_softc *mpt) > { > mpt_handler_t handler; > int error, unit; > > - MPT_LOCK(mpt); > handler.reply_handler = mpt_user_reply_handler; > error = mpt_register_handler(mpt, MPT_HANDLER_REPLY, handler, > &user_handler_id); > - MPT_UNLOCK(mpt); > if (error != 0) { > mpt_prt(mpt, "Unable to register user handler!\n"); > return (error); > } > unit = device_get_unit(mpt->dev); > I think this is fine to commit. I fixed it differently in FreeBSD7, but there's real locking there and enough differences to make it hard to directly compare. Go ahead and check it in, if you want. Scott From bob at immure.com Thu Aug 21 00:26:26 2008 From: bob at immure.com (Bob Willcox) Date: Thu Aug 21 00:26:33 2008 Subject: areca-cli: updating firmware Message-ID: <20080821001124.GA76080@rancor.immure.com> I want to update the firmware, bios, and boot code in my areca ARC-1210 raid controller and am a bit hesitant to do it for lack of any experience updating this card. I want to use the areca-cli program to do this and I see that the sys command appears to support updating the firmware (via the "sys updatefw" command), but I don't know if I can use this to update the bios and boot code as well or not. Can someone help me out on this? Also, is it ok to do this update while the system is up and using the card? Thanks, Bob -- Bob Willcox There are three ways to get something done: bob@immure.com 1: Do it yourself. Austin, TX 2: Hire someone to do it for you. 3: Forbid your kids to do it. From nick-freebsd-scsi at triantos.com Fri Aug 22 04:25:51 2008 From: nick-freebsd-scsi at triantos.com (Nick Triantos) Date: Fri Aug 22 04:25:58 2008 Subject: Patch to fix support for StorCase InfoStation Message-ID: <432CA218-138B-4F45-83BF-DA973B5D9F63@triantos.com> Hi, I've got a StorCase InfoStation 12-bay SATA-to-FC SAN attached to my FreeBSD server. It turns out that this device does not support the SCSI cmds to sync its cache. The patch below can be applied to /usr/src/sys/cam/scsi/da_scsi.c to fix this issue, but adding this storage system to the list of quirks. I don't know the correct procedure to submit this patch, but hopefully someone on this list can help. Please let me know if I should do anything else to get this checked in. best, -Nick *** scsi_da.c.orig Tue Aug 19 00:03:43 2008 --- scsi_da.c Tue Aug 19 22:54:41 2008 *************** *** 535,540 **** --- 535,547 ---- {T_DIRECT, SIP_MEDIA_REMOVABLE, "ChipsBnk", "USB*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, + { + /* + * StorCase (Kingston) InfoStation IFS FC2/SATA-R 201A + */ + {T_DIRECT, SIP_MEDIA_FIXED, "IFS", "FC2/SATA-R*", "*"}, + /*quirks*/ DA_Q_NO_SYNC_CACHE + }, }; static disk_strategy_t dastrategy; From sbruno at miralink.com Fri Aug 22 15:59:21 2008 From: sbruno at miralink.com (Sean Bruno) Date: Fri Aug 22 15:59:29 2008 Subject: Patch to fix support for StorCase InfoStation In-Reply-To: <432CA218-138B-4F45-83BF-DA973B5D9F63@triantos.com> References: <432CA218-138B-4F45-83BF-DA973B5D9F63@triantos.com> Message-ID: <48AEE252.1060007@miralink.com> Nick Triantos wrote: > Hi, > > I've got a StorCase InfoStation 12-bay SATA-to-FC SAN attached to my > FreeBSD server. It turns out that this device does not support the > SCSI cmds to sync its cache. > > The patch below can be applied to /usr/src/sys/cam/scsi/da_scsi.c to > fix this issue, but adding this storage system to the list of quirks. > > I don't know the correct procedure to submit this patch, but hopefully > someone on this list can help. > > Please let me know if I should do anything else to get this checked in. > > best, > -Nick > > *** scsi_da.c.orig Tue Aug 19 00:03:43 2008 > --- scsi_da.c Tue Aug 19 22:54:41 2008 > *************** > *** 535,540 **** > --- 535,547 ---- > {T_DIRECT, SIP_MEDIA_REMOVABLE, "ChipsBnk", "USB*", > "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE > }, > + { > + /* > + * StorCase (Kingston) InfoStation IFS FC2/SATA-R 201A > + */ > + {T_DIRECT, SIP_MEDIA_FIXED, "IFS", "FC2/SATA-R*", "*"}, > + /*quirks*/ DA_Q_NO_SYNC_CACHE > + }, > }; > > static disk_strategy_t dastrategy; Was this for 6 or 7? -- Sean Bruno MiraLink Corporation 6015 NE 80th Ave, Ste 100 Portland, OR 97218 Cell 503-358-6832 Phone 503-621-5143 Fax 503-621-5199 MSN: sbruno@miralink.com Google: seanwbruno@gmail.com From bugmaster at FreeBSD.org Mon Aug 25 11:06:57 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Aug 25 11:08:52 2008 Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org Message-ID: <200808251106.m7PB6up0027883@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/40895 scsi wierd kernel / device driver bug o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/60598 scsi wire down of scsi devices conflicts with config o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi 15 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce o kern/38828 scsi [dpt] [request] DPT PM2012B/90 doesn't work o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/120487 scsi [sg] scsi_sg incompatible with scanners o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/123666 scsi [aac] attach fails with Adaptec SAS RAID 3805 controll o kern/123674 scsi [ahc] ahc driver dumping 10 problems total. From westr at connection.ca Mon Aug 25 16:53:09 2008 From: westr at connection.ca (Ross) Date: Mon Aug 25 16:53:15 2008 Subject: isp(4) - setting debug mode flags In-Reply-To: <48A5CE50.4050404@samsco.org> References: <1051060505.20080815123014@connection.ca> <48A5CE50.4050404@samsco.org> Message-ID: <1465985957.20080825125307@connection.ca> To close my own thread with an answer. I forgot that I was statically compiling in my hints file into the kernel, so therefore any overrides in /boot/device.hints would not work. SL> First thing to check is whether what you're setting is actually making SL> it into the kernel environment and is what you expect it to be. Run SL> 'kenv' on a running system and look for the hint.isp.0.debug string. SL> Second, it looks like ISP_LOGDEBUG1 is the only flag that has any SL> special meaning, make sure you set that as well as ISP_LOGWARN and SL> ISP_LOGERR (the default flags). ISP_LOGDEBUG0 is the best so far - LOGDEBUG1 provides a huge amount of additional output that is probably too much. Debug value of "0x11F" provides the majority of needed output for me. Ross. -- From westr at connection.ca Tue Aug 26 20:42:00 2008 From: westr at connection.ca (Ross) Date: Tue Aug 26 20:42:11 2008 Subject: isp(4) - kernel panic on initialization of driver Message-ID: <13710393234.20080826164158@connection.ca> I've been tracking down a problem that is sometimes causing a kernel panic to occur when initializing the isp driver in the system. (System in question is a HP Blade - BL460c w/ QHM 6432 FC dual port card reporting as the following: isp0: port 0x4000-0x40ff mem 0xfdff0000-0xfdff3fff irq 18 at device 0.0 on pci16 isp0: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.90 isp1: port 0x4400-0x44ff mem 0xfdfe0000-0xfdfe3fff irq 19 at device 0.1 on pci16 isp1: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.90 We're doing a boot-via-san situation, and the issue looks to be that the card is receiving a ISPASYNC_CHANGE_PDB command on isp1 before it's ready for it. I'm guessing it's due to the fact the card already as the firmware loaded and active (due to the boot). Console debug (hint.isp.[01].debug=0x11f) output looks like the following on a crash: -= kernel: isp1: port 0x4400-0x44ff mem 0xfdfe0000-0xfdfe3fff irq 19 at device 0.1 on pci16 kernel: isp1: set PCI latency to 64 kernel: isp1: [ITHREAD] kernel: isp1: line 5345: markportdb kernel: isp1: Port Database Changed kernel: isp1: Port Database Changed: freeze simq (loopdown) [crash] -= Further debugging shows that isp_freeze_loopdown() function that is called at the above point never returns. Quick guess is the called xpt_freeze_simq() function [line 290 in isp_freebsd.c] is the culprit, but that's about the limit of my ability for tracking this down. If anyone has any pointers to fixing this, that would be appreciated! Thanks, Ross. (Also filed http://www.freebsd.org/cgi/query-pr.cgi?pr=126866 with basically the same notes above) -- From scottl at samsco.org Tue Aug 26 21:00:29 2008 From: scottl at samsco.org (Scott Long) Date: Tue Aug 26 21:00:38 2008 Subject: isp(4) - kernel panic on initialization of driver In-Reply-To: <13710393234.20080826164158@connection.ca> References: <13710393234.20080826164158@connection.ca> Message-ID: <48B46EE1.8060408@samsco.org> Ross wrote: > I've been tracking down a problem that is sometimes causing a kernel > panic to occur when initializing the isp driver in the system. (System > in question is a HP Blade - BL460c w/ QHM 6432 FC dual port card > reporting as the following: > > isp0: port 0x4000-0x40ff mem 0xfdff0000-0xfdff3fff irq 18 at device 0.0 on pci16 > isp0: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.90 > isp1: port 0x4400-0x44ff mem 0xfdfe0000-0xfdfe3fff irq 19 at device 0.1 on pci16 > isp1: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.90 > > We're doing a boot-via-san situation, and the issue looks to be that > the card is receiving a ISPASYNC_CHANGE_PDB command on isp1 before > it's ready for it. I'm guessing it's due to the fact the card already > as the firmware loaded and active (due to the boot). > > Console debug (hint.isp.[01].debug=0x11f) output looks like the following on a crash: > > -= > kernel: isp1: port 0x4400-0x44ff mem 0xfdfe0000-0xfdfe3fff irq 19 at device 0.1 on pci16 > kernel: isp1: set PCI latency to 64 > kernel: isp1: [ITHREAD] > kernel: isp1: line 5345: markportdb > kernel: isp1: Port Database Changed > kernel: isp1: Port Database Changed: freeze simq (loopdown) > [crash] > -= > > Further debugging shows that isp_freeze_loopdown() function that is > called at the above point never returns. Quick guess is the called > xpt_freeze_simq() function [line 290 in isp_freebsd.c] is the culprit, > but that's about the limit of my ability for tracking this down. > > > If anyone has any pointers to fixing this, that would be appreciated! > > Thanks, > Ross. > > (Also filed http://www.freebsd.org/cgi/query-pr.cgi?pr=126866 with > basically the same notes above) > Please post the panic trace and messages. Scott From erich at fuujinnetworks.com Wed Aug 27 03:13:50 2008 From: erich at fuujinnetworks.com (Fuujin Networks LLC) Date: Wed Aug 27 03:13:56 2008 Subject: Qlogic FC scsi_target ISP2310 Message-ID: <48B4CF57.30603@fuujinnetworks.com> I've run into a snag with our SAN and I'm hoping someone out there can shed some light on the glass, as it were. We're trying to use scsi_target mode with a pair of QLogic ISP2310 2GB fibre-channel cards in a Point-to-Point topology. These cards will NOT be part of a switch fabric. I started out with a quad port card, and when I rescanned the SCSI bus on the initiating end of the loop, the target machine tanked. The filer dumps core, reboots, and reproduces the result faithfully. Thinking I might be doing something wrong with port selection, I pulled the quad port card out of the SAN filer and installed a single port card. After several days of jiggery pokery with all manner of knobs, AIO, etc, I've ended up with the same result: a 64-bit space heater. :) I've Googled this to death, crawled the lists, and I've not found a solution other than the occasional "it might be buggy" comment. Full kernel traces are available to anyone interested, possibly even ssh access to a test box if requested. I've been running FreeBSD for about 7 years and I've never seen an operating system as stable, so I'm rather determined to make this work than go to Linux. The filer is running FreeBSD 7.0 amd64, dual Opteron 2216's, with 8 gigs of ram. I've tested this on a pair of HP DL580 G3's to see if there was a difference from 32-bit to 64-bit hardware, but none was apparent. Any help is greatly appreciated! -- Erich M. Jenkins Fuujin Networks, LLC PO Box 792 Brainerd, MN 56401 (p) 218-824-5038 (f) 218-824-7516 "You should never, never doubt what no one is sure about." --Gene Wilder From linimon at FreeBSD.org Wed Aug 27 09:05:45 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Wed Aug 27 09:05:51 2008 Subject: kern/126866: [isp] [panic] kernel panic on card initialization Message-ID: <200808270905.m7R95iY0005859@freefall.freebsd.org> Synopsis: [isp] [panic] kernel panic on card initialization Responsible-Changed-From-To: freebsd-bugs->freebsd-scsi Responsible-Changed-By: linimon Responsible-Changed-When: Wed Aug 27 09:05:34 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=126866 From westr at connection.ca Wed Aug 27 14:42:24 2008 From: westr at connection.ca (Ross) Date: Wed Aug 27 14:42:30 2008 Subject: isp(4) - kernel panic on initialization of driver In-Reply-To: <48B46EE1.8060408@samsco.org> References: <13710393234.20080826164158@connection.ca> <48B46EE1.8060408@samsco.org> Message-ID: <302438113.20080827104209@connection.ca> SL> Please post the panic trace and messages. If the formatting/spelling is off, it's due to the cut-n-paste done by hand/eye. (due virtual ILOM screen, so no easy save) -= start isp1: set PCI latency to 64 isp1: [ITHREAD] isp1: line 5345: markportdb isp1: Port Database Changed Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 0 fault virtual address = 0x58 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0243a46 stack pointer = 0x20:0xc0af63cc frame pointer = 0x20:0xc0af63cc code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (swapper) trap number = 12 panic: page fault cpuid = 0 Uptime: 1s Automatic reboot in 15 seconds - press a key on the console to abort ... -= end The values are the same across different reboots. R. -- From pisymbol at gmail.com Wed Aug 27 15:12:14 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Wed Aug 27 15:12:21 2008 Subject: isp(4) - kernel panic on initialization of driver In-Reply-To: <48B46EE1.8060408@samsco.org> References: <13710393234.20080826164158@connection.ca> <48B46EE1.8060408@samsco.org> Message-ID: <3c0b01820808270743n5fd40995u6e9506b772f2b03c@mail.gmail.com> On Tue, Aug 26, 2008 at 4:41 PM, Ross wrote: > I've been tracking down a problem that is sometimes causing a kernel > panic to occur when initializing the isp driver in the system. (System > in question is a HP Blade - BL460c w/ QHM 6432 FC dual port card > reporting as the following: > > isp0: port 0x4000-0x40ff mem 0xfdff0000-0xfdff3fff irq 18 at device 0.0 on pci16 > isp0: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.90 > isp1: port 0x4400-0x44ff mem 0xfdfe0000-0xfdfe3fff irq 19 at device 0.1 on pci16 > isp1: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.90 So one question I have is why doesn't the isp driver load the firmware in ispfw? This is 7.x, so firmware_get() should have returned the isp_2400 registered firmware image for a 2422 card and loaded it it in isp_reset() unless dodnld was set to zero from a hint flag. By any chance do you have "hint.isp.0.fwload_disable" set? I'm not saying that will fix your problem but I just noticed this. > We're doing a boot-via-san situation, and the issue looks to be that > the card is receiving a ISPASYNC_CHANGE_PDB command on isp1 before > it's ready for it. I'm guessing it's due to the fact the card already > as the firmware loaded and active (due to the boot). Nah, I don't think that's it exactly. i.e. whether or not isp/ispfw loads the firmware on boot, I think you can still see this issue. It looks like after isp_reset() is performed which grabs information from the ISP and normally attempts to reset it and do further hardware initialization we get into isp_init(). The isp_init() function attempts to issue MBOX_SET_FIRMWARE_OPTIONS command which will generate an Asynchronous event when a LIP is received. At this point the ISP_LOCK is held (which blocks any ISR at this point). However I see in isp_attach() we drop it which I will bet is when chaos ensues (isr proceeds, obtains the lock and goes through the async event path). WARNING: this is pure speculation on my part from quickly looking at it. I'm just wondering if this is a bug in isp allowing the ASYNC events occuring before a complete attach has been performed. > Console debug (hint.isp.[01].debug=0x11f) output looks like the following on a crash: > > -= > kernel: isp1: port 0x4400-0x44ff mem 0xfdfe0000-0xfdfe3fff irq 19 at device 0.1 on pci16 > kernel: isp1: set PCI latency to 64 > kernel: isp1: [ITHREAD] > kernel: isp1: line 5345: markportdb > kernel: isp1: Port Database Changed > kernel: isp1: Port Database Changed: freeze simq (loopdown) > [crash] > -= > > Further debugging shows that isp_freeze_loopdown() function that is > called at the above point never returns. Quick guess is the called > xpt_freeze_simq() function [line 290 in isp_freebsd.c] is the culprit, > but that's about the limit of my ability for tracking this down. Yes but its not clear to me the bug is REALLY in CAM...yet. It might be ISP letting asynchronous events in before its really ready which means its a driver problem. I have a 24xx card in the lab, maybe I can test this if I get a chance too. I'm assuming all you are doing is booting 7.0-RELEASE off the SAN? As Scott asked, can you get a dump or a trace of the crash? Thanks! -aps From westr at connection.ca Wed Aug 27 15:27:53 2008 From: westr at connection.ca (Ross) Date: Wed Aug 27 15:27:59 2008 Subject: isp(4) - kernel panic on initialization of driver In-Reply-To: <3c0b01820808270743n5fd40995u6e9506b772f2b03c@mail.gmail.com> References: <13710393234.20080826164158@connection.ca> <48B46EE1.8060408@samsco.org> <3c0b01820808270743n5fd40995u6e9506b772f2b03c@mail.gmail.com> Message-ID: <86689256.20080827112751@connection.ca> AS> So one question I have is why doesn't the isp driver load the AS> firmware in ispfw? [...clip...] By any chance do you have AS> "hint.isp.0.fwload_disable" set? I'm not saying that will fix your AS> problem but I just noticed this. The card came with 4.00.90 which is newer than the firmware included in ispfw (4.00.20) - but to answer your question, yes, fwload_disable is set for the cards. I believe that got set since there was some kind of problem that we ran into at one point. I'll do some testing on that. AS> WARNING: this is pure speculation on my part from quickly looking at AS> it. I'm just wondering if this is a bug in isp allowing the ASYNC AS> events occuring before a complete attach has been performed. Apologies if it didn't come across in my previous message, that's basically what I'm saying too. I have about a 50/50 chance of the card booting successfully, and when it does, the output from the debug lines is different, and it looks like the ASYNC event is executing before it's time. (See the end of this message for the full output from isp[01]: for a clean boot, since it's a long output.) AS> Yes but its not clear to me the bug is REALLY in CAM...yet. It might AS> be ISP letting asynchronous events in before its really ready which AS> means its a driver problem. I have a 24xx card in the lab, maybe I AS> can test this if I get a chance too. That's what I'm assuming as well. AS> I'm assuming all you are doing is booting 7.0-RELEASE off the SAN? That is correct. We're just using the i386 image at the current time (eventually going to amd64, but not now). It's been trimmed of the fat, but nothing special/custom added in. AS> As Scott asked, can you get a dump or a trace of the crash? See my other message to freebsd-scsi/Scott for that. Thanks for looking at the issue! Ross. -= start of clean boot output. isp0: port 0x4000-0x40ff mem 0xfdff0000-0xfdff3fff irq 18 at device 0.0 on pci16 isp0: set PCI latency to 64 isp0: [ITHREAD] isp0: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.90 isp0: 2K Logins Supported isp0: Last F/W revision was 4.0.90 isp0: 4096 max I/O command limit set isp0: line 1207: markportdb isp0: Starting Initial Loop Down Timer isp1: port 0x4400-0x44ff mem 0xfdfe0000-0xfdfe3fff irq 19 at device 0.1 on pci16 isp1: set PCI latency to 64 isp1: [ITHREAD] isp1: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.90 isp1: 2K Logins Supported isp1: Last F/W revision was 4.0.90 isp1: 4096 max I/O command limit set isp1: line 1207: markportdb isp1: Starting Initial Loop Down Timer ... [other driver startup and then eventually we get] ... isp0: line 5271: markportdb isp0: LIP Received isp1: line 5271: markportdb isp1: LIP Received isp0: line 5333: markportdb isp0: LOOP Reset isp1: line 5333: markportdb isp1: LOOP Reset isp0: line 5271: markportdb isp0: LIP Received isp1: line 5271: markportdb isp1: LIP Received isp0: line 5271: markportdb isp0: LIP Received isp1: line 5271: markportdb isp1: LIP Received isp0: line 5333: markportdb isp0: LOOP Reset isp1: line 5333: markportdb isp1: LOOP Reset isp0: line 5360: markportdb isp0: Stopping Loop Down Timer isp0: Other Change Notify isp0: Point-to-Point mode isp1: line 5360: markportdb isp1: Stopping Loop Down Timer isp1: Other Change Notify isp1: Point-to-Point mode isp0: line 5307: markportdb isp0: Loop UP isp1: line 5307: markportdb isp1: Loop UP isp0: line 5345: markportdb isp0: Port Database Changed isp1: line 5345: markportdb isp1: Port Database Changed isp0: line 5345: markportdb isp0: Port Database Changed isp1: line 5345: markportdb isp1: Port Database Changed isp0: line 5345: markportdb isp0: Port Database Changed isp1: line 5345: markportdb isp1: Port Database Changed isp0: isp_kthread: checking FC state isp0: FC Link Test Entry isp0: line 2460: markportdb isp1: isp_kthread: checking FC state isp1: FC Link Test Entry isp1: line 2460: markportdb isp0: Firmware State Ready> isp1: Firmware State Ready> isp1: Register FC4 Type accepted isp0: Register FC4 Type accepted isp1: 4Gb link speed/s isp1: HBA PortID 0x030100 N-Port Handle 0, Connection Topology 'F Port' isp1: HBA WWNN 0x5001438002211547 HBA WWPN 0x5001438002211546 isp1: FC Link Test Complete isp1: FC Scan Fabric isp0: 4Gb link speed/s isp0: HBA PortID 0x020100 N-Port Handle 0, Connection Topology 'F Port' isp0: HBA WWNN 0x5001438002211545 HBA WWPN 0x5001438002211544 isp0: FC Link Test Complete isp0: FC Scan Fabric isp1: got 12 ports back from name server isp1: skip ourselves @ PortID 0x030100 isp1: Checking Fabric Port 0x030200 isp0: got 13 ports back from name server isp0: skip ourselves @ PortID 0x020100 isp0: Checking Fabric Port 0x020200 isp1: Fabric Port 0x030200 is New Entry isp1: Checking Fabric Port 0x030300 isp0: Fabric Port 0x020200 is New Entry isp0: Checking Fabric Port 0x020300 isp1: Fabric Port 0x030300 is New Entry isp1: Checking Fabric Port 0x030d00 isp0: Fabric Port 0x020300 is New Entry isp0: Checking Fabric Port 0x020d00 isp1: Fabric Port 0x030d00 is New Entry isp1: Checking Fabric Port 0x030e00 isp0: Fabric Port 0x020d00 is New Entry isp0: Checking Fabric Port 0x020e00 isp1: Fabric Port 0x030e00 is New Entry isp1: Checking Fabric Port 0x031000 isp1: Fabric Port 0x031100 is New Entry isp1: Checking Fabric Port 0x050100 isp0: Fabric Port 0x021100 is New Entry isp0: Checking Fabric Port 0x040100 isp1: Fabric Port 0x050100 is New Entry isp1: Checking Fabric Port 0x050200 isp0: Fabric Port 0x040100 is New Entry isp0: Checking Fabric Port 0x040200 isp1: Fabric Port 0x050200 is New Entry isp1: Checking Fabric Port 0x050800 isp0: Fabric Port 0x040200 is New Entry isp0: Checking Fabric Port 0x040800 isp1: Fabric Port 0x050800 is New Entry isp1: Checking Fabric Port 0x050e00 isp0: Fabric Port 0x040800 is New Entry isp0: Checking Fabric Port 0x040e00 isp1: Fabric Port 0x050e00 is New Entry isp1: Checking Fabric Port 0x051100 isp0: Fabric Port 0x040e00 is New Entry isp0: Checking Fabric Port 0x041100 isp1: Fabric Port 0x051100 is New Entry isp1: FC Scan Fabric Done isp1: Synchronizing PDBs isp1: PortID 0x030200 handle 0x1 role Initiator arrived WWNN 0x5001438001fff1df WWPN 0x5001438001fff1de isp1: PortID 0x030300 handle 0x2 role Initiator arrived WWNN 0x5001438002211a1f WWPN 0x5001438002211a1e isp1: PortID 0x030d00 handle 0x3 role Initiator arrived WWNN 0x5001438001fff22f WWPN 0x5001438001fff22e isp1: PortID 0x030e00 handle 0x4 role Initiator arrived WWNN 0x50014380022110c7 WWPN 0x50014380022110c6 isp1: PortID 0x031000 handle 0x5 role Initiator arrived WWNN 0x5001438001fff22b WWPN 0x5001438001fff22a isp1: PortID 0x031100 handle 0x6 role Target arrived at tgt 0 WWNN 0x50001fe150103510 WWPN 0x50001fe15010351d isp1: PortID 0x050100 handle 0x7 role Initiator arrived WWNN 0x5001438002211a6f WWPN 0x5001438002211a6e isp1: PortID 0x050200 handle 0x8 role Initiator arrived WWNN 0x5001438002211aab WWPN 0x5001438002211aaa isp1: PortID 0x050800 handle 0x9 role Initiator arrived WWNN 0x5001438002211a9b WWPN 0x5001438002211a9a isp1: PortID 0x050e00 handle 0xa role Initiator arrived WWNN 0x5001438001fff267 WWPN 0x5001438001fff266 isp1: PortID 0x051100 handle 0xb role Target arrived at tgt 1 WWNN 0x50001fe150103510 WWPN 0x50001fe150103518 isp1: PortID 0xfffffe handle 0x7fe role (none) stayed WWNN 0x100000051e092a5e WWPN 0x200100051e092a5e isp1: isp_kthread: FC state OK isp1: isp_kthread: releasing simq isp1: isp_kthread: sleep time 0 isp0: Fabric Port 0x041100 is New Entry isp0: Checking Fabric Port 0x041300 isp0: Fabric Port 0x041300 is New Entry isp0: FC Scan Fabric Done isp0: Synchronizing PDBs isp0: PortID 0x020200 handle 0x1 role Initiator arrived WWNN 0x5001438001fff1dd WWPN 0x5001438001fff1dc isp0: PortID 0x020300 handle 0x2 role Initiator arrived WWNN 0x5001438002211a1d WWPN 0x5001438002211a1c isp0: PortID 0x020d00 handle 0x3 role Initiator arrived WWNN 0x5001438001fff22d WWPN 0x5001438001fff22c isp0: PortID 0x020e00 handle 0x4 role Initiator arrived WWNN 0x50014380022110c5 WWPN 0x50014380022110c4 isp0: PortID 0x021000 handle 0x5 role Initiator arrived WWNN 0x5001438001fff229 WWPN 0x5001438001fff228 isp0: PortID 0x021100 handle 0x6 role Target arrived at tgt 0 WWNN 0x50001fe150103510 WWPN 0x50001fe150103519 isp0: PortID 0x040100 handle 0x7 role Initiator arrived WWNN 0x5001438002211a6d WWPN 0x5001438002211a6c isp0: PortID 0x040200 handle 0x8 role Initiator arrived WWNN 0x5001438002211aa9 WWPN 0x5001438002211aa8 isp0: PortID 0x040800 handle 0x9 role Initiator arrived WWNN 0x5001438002211a99 WWPN 0x5001438002211a98 isp0: PortID 0x040e00 handle 0xa role Initiator arrived WWNN 0x5001438001fff265 WWPN 0x5001438001fff264 isp0: PortID 0x041100 handle 0xb role Target arrived at tgt 1 WWNN 0x50001fe150103510 WWPN 0x50001fe15010351c isp0: PortID 0x041300 handle 0xc role Initiator arrived WWNN 0x200000e08b8b962d WWPN 0x210000e08b8b962d isp0: PortID 0xfffffe handle 0x7fe role (none) stayed WWNN 0x100000051e092a7c WWPN 0x200100051e092a7c isp0: isp_kthread: FC state OK isp0: isp_kthread: releasing simq isp0: isp_kthread: sleep time 0 da0 at isp0 bus 0 target 0 lun 1 da0: Fixed Direct Access SCSI-5 device da0: 400.000MB/s transfers da0: Command Queueing Enabled da0: 51200MB (104857600 512 byte sectors: 255H 63S/T 6527C) da1 at isp0 bus 0 target 1 lun 1 da1: Fixed Direct Access SCSI-5 device da1: 400.000MB/s transfers WWNN 0x5001438002211a1d WWPN 0x5001438002211a1c PortID 0x20300 da1: Command Queueing Enabled da1: 51200MB (104857600 512 byte sectors: 255H 63S/T 6527C) da2 at isp1 bus 0 target 0 lun 1 da2: Fixed Direct Access SCSI-5 device da2: 400.000MB/s transfers da2: Command Queueing Enabled da2: 51200MB (104857600 512 byte sectors: 255H 63S/T 6527C) da3 at isp1 bus 0 target 1 lun 1 da3: Fixed Direct Access SCSI-5 device da3: 400.000MB/s transfers WWNN 0x5001438002211a1f WWPN 0x5001438002211a1e PortID 0x30300 da3: Command Queueing Enabled da3: 51200MB (104857600 512 byte sectors: 255H 63S/T 6527C) GEOM_MULTIPATH: adding da0 to bootsys/cdbc7869-2d76-11dd-8fe6-001e0bc7e1d0 GEOM_MULTIPATH: da0 now active path in bootsys SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! GEOM_MULTIPATH: adding da1 to bootsys/cdbc7869-2d76-11dd-8fe6-001e0bc7e1d0 GEOM_MULTIPATH: adding da2 to bootsys/cdbc7869-2d76-11dd-8fe6-001e0bc7e1d0 GEOM_MULTIPATH: adding da3 to bootsys/cdbc7869-2d76-11dd-8fe6-001e0bc7e1d0 Trying to mount root from ufs:/dev/multipath/bootsyss1a [... successful boot at this point ...] -= end -- From pisymbol at gmail.com Wed Aug 27 20:33:08 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Wed Aug 27 20:33:15 2008 Subject: isp(4) - kernel panic on initialization of driver In-Reply-To: <86689256.20080827112751@connection.ca> References: <13710393234.20080826164158@connection.ca> <48B46EE1.8060408@samsco.org> <3c0b01820808270743n5fd40995u6e9506b772f2b03c@mail.gmail.com> <86689256.20080827112751@connection.ca> Message-ID: <3c0b01820808271333l34ead8ele99daab695baf667@mail.gmail.com> On Wed, Aug 27, 2008 at 11:27 AM, Ross wrote: > > AS> So one question I have is why doesn't the isp driver load the > AS> firmware in ispfw? [...clip...] By any chance do you have > AS> "hint.isp.0.fwload_disable" set? I'm not saying that will fix your > AS> problem but I just noticed this. Ok, I just wanted to make sure I wasn't going crazy! I've just patched the 6.x driver to load firmware again (without the use of the generic firmware driver which I believe came later). I still have to thoroughly test it. > The card came with 4.00.90 which is newer than the firmware included in > ispfw (4.00.20) - but to answer your question, yes, fwload_disable is > set for the cards. I believe that got set since there was some kind > of problem that we ran into at one point. I'll do some testing on > that. Please do but I think what you did is very reasonable. > AS> WARNING: this is pure speculation on my part from quickly looking at > AS> it. I'm just wondering if this is a bug in isp allowing the ASYNC > AS> events occuring before a complete attach has been performed. > > Apologies if it didn't come across in my previous message, that's > basically what I'm saying too. I have about a 50/50 chance of the > card booting successfully, and when it does, the output from the debug > lines is different, and it looks like the ASYNC event is executing > before it's time. (See the end of this message for the full output > from isp[01]: for a clean boot, since it's a long output.) Yea it looks like the loop down timer kthread gets started further down isp_attach() which I believe maybe the issue. I got to look at it some more. > AS> I'm assuming all you are doing is booting 7.0-RELEASE off the SAN? > > That is correct. We're just using the i386 image at the current time > (eventually going to amd64, but not now). It's been trimmed of the > fat, but nothing special/custom added in. Sure, thanks. > AS> As Scott asked, can you get a dump or a trace of the crash? > > See my other message to freebsd-scsi/Scott for that. Ummm, can you get more than what you posted? i.e. can you rebuild the kernel with options KDB options DDB which will enable the debugger. Then boot with a "-v" and when you get into a panic you should fall into the kernel debugger. At the prompt do a "t" and copy the output (you don't have to copy the addresses just the stack trace). I think I can imagine what it looks like but it would be better if you provided it. Another possibility is to obtain a dump: http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html#KERNELDEBUG-OBTAIN and post it. Thanks! -aps From pisymbol at gmail.com Wed Aug 27 22:20:56 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Wed Aug 27 22:21:03 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <48B4CF57.30603@fuujinnetworks.com> References: <48B4CF57.30603@fuujinnetworks.com> Message-ID: <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> On Tue, Aug 26, 2008 at 11:51 PM, Fuujin Networks LLC wrote: > I've run into a snag with our SAN and I'm hoping someone out there can shed > some light on the glass, as it were. We're trying to use scsi_target mode > with a pair of QLogic ISP2310 2GB fibre-channel cards in a Point-to-Point > topology. These cards will NOT be part of a switch fabric. I started out > with a quad port card, and when I rescanned the SCSI bus on the initiating > end of the loop, the target machine tanked. > The filer dumps core, reboots, and reproduces the result faithfully. How about posting some stack traces, dmesg output, etc. etc. about how it tanks? Thanks! -aps From up at 3.am Thu Aug 28 15:10:13 2008 From: up at 3.am (up@3.am) Date: Thu Aug 28 15:10:20 2008 Subject: Urgent: older server disk failed, need mirror replacement Message-ID: Hi: Sorry for the hectic nature of this, but I just got a kernel notification (and confirmed using aaccli's disk list) that one of my RAID level 1 disks on one of my servers has died. The problem is, I'm not entirely sure exactly what replacement I should be ordering. Is there a command line utility that gives you the manufacturer's info (model no, part no) of the disk itself? I looked at some of the disk utilities in ports/systutilities and I'm not seeing anything. Here is what I think I know: IMB/Hitachi 72GB or 73GB 10k or 15k RPM U320 SCSI I built this server 3 or 4 years ago...the RAID adapter is a low profile Adaptec U320 RAID using the aac driver. Are the newer, SAS drives downward compatible with U320 adapters? Is it ok to mix and match, as long as the drives are close in storage capacity? What about mix and matching 10k and 15k rpm? Thanks in Advance! James Smallacombe PlantageNet, Inc. CEO and Janitor up@3.am http://3.am ========================================================================= From mcdouga9 at egr.msu.edu Thu Aug 28 16:23:44 2008 From: mcdouga9 at egr.msu.edu (Adam McDougall) Date: Thu Aug 28 16:23:50 2008 Subject: Urgent: older server disk failed, need mirror replacement In-Reply-To: References: Message-ID: <48B6CD05.40805@egr.msu.edu> up@3.am wrote: > > Hi: > > Sorry for the hectic nature of this, but I just got a kernel > notification (and confirmed using aaccli's disk list) that one of my > RAID level 1 disks on one of my servers has died. The problem is, > I'm not entirely sure exactly what replacement I should be ordering. > Is there a command line utility that gives you the manufacturer's info > (model no, part no) of the disk itself? I looked at some of the disk > utilities in ports/systutilities and I'm not seeing anything. Here is > what I think I know: > > IMB/Hitachi 72GB or 73GB 10k or 15k RPM U320 SCSI > > I built this server 3 or 4 years ago...the RAID adapter is a low > profile Adaptec U320 RAID using the aac driver. Are the newer, SAS > drives downward compatible with U320 adapters? Is it ok to mix and > match, as long as the drives are close in storage capacity? What > about mix and matching 10k and 15k rpm? > > Thanks in Advance! > > James Smallacombe PlantageNet, Inc. CEO and Janitor arcconf GETCONFIG 1 PD or probably in your case: aaccli GETCONFIG 1 PD should return output that includes the model number: Device #6 Device is a Hard drive State : Online Supported : Yes Transfer Speed : SAS 3.0 Gb/s Reported Channel,Device : 0,6 Reported Location : Connector 1, Device 2 Vendor : SEAGATE Model : ST914602SSUN146G Firmware : 0603 World-wide name : 5000C500092888A0 Size : 140009 MB Write Cache : Disabled (write-through) FRU : None S.M.A.R.T. : No From up at 3.am Thu Aug 28 16:53:53 2008 From: up at 3.am (up@3.am) Date: Thu Aug 28 16:53:59 2008 Subject: Urgent: older server disk failed, need mirror replacement In-Reply-To: References: Message-ID: That did the trick...turns out it's a 10k RPM, I just ordered a replacement for $44, plus $45 to ship it overnight, I must have spent nearly 10 times that much on the drive back in the day... On Thu, 28 Aug 2008, jason kawaja wrote: > try 'camcontrol devlist' > > good luck. > > On Aug 28, 2008, at 10:50 AM, up@3.am wrote: > >> >> Hi: >> >> Sorry for the hectic nature of this, but I just got a kernel notification >> (and confirmed using aaccli's disk list) that one of my RAID level 1 disks >> on one of my servers has died. The problem is, I'm not entirely sure >> exactly what replacement I should be ordering. Is there a command line >> utility that gives you the manufacturer's info (model no, part no) of the >> disk itself? I looked at some of the disk utilities in ports/systutilities >> and I'm not seeing anything. Here is what I think I know: >> >> IMB/Hitachi 72GB or 73GB 10k or 15k RPM U320 SCSI >> >> I built this server 3 or 4 years ago...the RAID adapter is a low profile >> Adaptec U320 RAID using the aac driver. Are the newer, SAS drives downward >> compatible with U320 adapters? Is it ok to mix and match, as long as the >> drives are close in storage capacity? What about mix and matching 10k and >> 15k rpm? >> >> Thanks in Advance! >> >> James Smallacombe PlantageNet, Inc. CEO and Janitor >> up@3.am >> http://3.am >> ========================================================================= >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > > > > -- > Jason Kawaja, 2-4568 > IT Expert, UF Dept of ECE > > > > James Smallacombe PlantageNet, Inc. CEO and Janitor up@3.am http://3.am ========================================================================= From ivoras at freebsd.org Thu Aug 28 17:05:03 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Thu Aug 28 17:05:55 2008 Subject: Urgent: older server disk failed, need mirror replacement In-Reply-To: References: Message-ID: up@3.am wrote: > > Hi: > > Sorry for the hectic nature of this, but I just got a kernel > notification (and confirmed using aaccli's disk list) that one of my > RAID level 1 disks on one of my servers has died. The problem is, I'm > not entirely sure exactly what replacement I should be ordering. Is > there a command line utility that gives you the manufacturer's info > (model no, part no) of the disk itself? I looked at some of the disk > utilities in ports/systutilities and I'm not seeing anything. Here is > what I think I know: > > IMB/Hitachi 72GB or 73GB 10k or 15k RPM U320 SCSI I think you should use arcconf (from ports) to find out your drive models and then look them up in Google or IBM's support pages. See this for example: http://www.clarkconnect.com/forums/showthreaded.php?Number=104029 > I built this server 3 or 4 years ago...the RAID adapter is a low profile > Adaptec U320 RAID using the aac driver. Are the newer, SAS drives > downward compatible with U320 adapters? No, SCSI and SAS connectors are not compatible. You can still buy SCSI drives. > Is it ok to mix and match, as > long as the drives are close in storage capacity? What about mix and > matching 10k and 15k rpm? As long as your new drive is larger or faster than the old one, it should be fine. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-scsi/attachments/20080828/71e3d763/signature.pgp From sbruno at miralink.com Thu Aug 28 17:34:19 2008 From: sbruno at miralink.com (Sean Bruno) Date: Thu Aug 28 17:34:25 2008 Subject: [ISP] QLA2432 Target Mode Broken Message-ID: <48B6E19A.7050603@miralink.com> I tried putting a 2432 into target mode this week and noted that the system threw a pretty nice panic and thought I would post the output here. Reviewing the 4G documentation from Qlogic, it looks like they've substantially changed the target mode interface, so I'm not surprised that there's some work to do. If anyone has any patches they'd like me to test, I'm open to integration: isp0: port 0x2000-0x20ff mem 0xe8300000-0xe8303fff irq 16 at device 0.0 on pci4 isp0: setting role to 0x0 for unit 0 isp0: [GIANT-LOCKED] isp0: Polled Mailbox Command (0x8) Timeout (100000us) isp0: Board Type 2422, Chip Revision 0x2, loaded F/W Revision 4.0.20 isp1: port 0x2400-0x24ff mem 0xe8304000-0xe8307fff irq 17 at device 0.1 on pci4 isp1: setting role to 0x2 for unit 1 isp1: [GIANT-LOCKED] isp1: Polled Mailbox Command (0x8) Timeout (100000us) isp1: Board Type 2422, Chip Revision 0x2, loaded F/W Revision 4.0.20 ... isp0: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.20 isp0: Board Type 2422, Chip Revision 0x2, loaded F/W Revision 4.0.20 (odbp0:isp0:0:4:0): Target Mode Enabled (odbp0:isp0:0:4:0): ENABLE LUN returned 0x0 (lun 0) (odbp0:isp0:0:4:0): enable lun CCB rejected, status 0x4 enable lun failed, status 0x4 targinit: targenlun failed with status 0x4 Aug 28 02:06:46 kernel: B-Srch failed to find head/tail Aug 28 02:06:46 kernel: Loop counter at max, aborting. Aug 28 02:06:46 kernel: Targinit was not successfull, TheSoftc == NULL isp0: target notify code 0x1007 Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0546709 stack pointer = 0x28:0xe7a2fb90 frame pointer = 0x28:0xe7a2fb90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 21 (irq16: bge0 isp0++) [db> trace Tracing pid 21 tid 100023 td 0xc651e780 isp_get_hdr(c6544000,0,e7a2fbe4) at isp_get_hdr+0x9 isp_intr(c6544000,801d,0,0) at isp_intr+0x264 isp_pci_intr(c6544000) at isp_pci_intr+0x6f ithread_execute_handlers(c651d430,c644e500) at ithread_execute_handlers+0xe6 ithread_loop(c6542070,e7a2fd38,c6542070,c05e1f68,0,...) at ithread_loop+0x66 fork_exit(c05e1f68,c6542070,e7a2fd38) at fork_exit+0xa0 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe7a2fd6c, ebp = 0 --- -- Sean Bruno MiraLink Corporation 6015 NE 80th Ave, Ste 100 Portland, OR 97218 Phone 503-621-5143 Fax 503-621-5199 MSN: sbruno@miralink.com Google: seanwbruno@gmail.com Yahoo: sean_bruno@yahoo.com From pisymbol at gmail.com Thu Aug 28 17:59:11 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Thu Aug 28 17:59:21 2008 Subject: [ISP] QLA2432 Target Mode Broken In-Reply-To: <48B6E19A.7050603@miralink.com> References: <48B6E19A.7050603@miralink.com> Message-ID: <3c0b01820808281059k3c33e352g6be72f02817e8e6a@mail.gmail.com> On Thu, Aug 28, 2008 at 1:34 PM, Sean Bruno wrote: > I tried putting a 2432 into target mode this week and noted that the system > threw a pretty nice panic and thought I would post the output here. > Reviewing the 4G documentation from Qlogic, it looks like they've > substantially changed the target mode interface, so I'm not surprised that > there's some work to do. If anyone has any patches they'd like me to test, > I'm open to integration: Did you rebuild the isp driver with -DISP_TARGET_MODE defined? I only mention this because the output below seems like you twiddled the "role" hint instead of actually recompile the driver? -aps From sbruno at miralink.com Thu Aug 28 18:31:57 2008 From: sbruno at miralink.com (Sean Bruno) Date: Thu Aug 28 18:32:03 2008 Subject: [ISP] QLA2432 Target Mode Broken In-Reply-To: <3c0b01820808281059k3c33e352g6be72f02817e8e6a@mail.gmail.com> References: <48B6E19A.7050603@miralink.com> <3c0b01820808281059k3c33e352g6be72f02817e8e6a@mail.gmail.com> Message-ID: <48B6EF1A.1040805@miralink.com> Alexander Sack wrote: > On Thu, Aug 28, 2008 at 1:34 PM, Sean Bruno wrote: > >> I tried putting a 2432 into target mode this week and noted that the system >> threw a pretty nice panic and thought I would post the output here. >> Reviewing the 4G documentation from Qlogic, it looks like they've >> substantially changed the target mode interface, so I'm not surprised that >> there's some work to do. If anyone has any patches they'd like me to test, >> I'm open to integration: >> > > Did you rebuild the isp driver with -DISP_TARGET_MODE defined? I only > mention this because the output below seems like you twiddled the > "role" hint instead of actually recompile the driver? > > -aps > Ah, yes, that's a little magic I was trying ... sorry about that. Yes, I definitely compiled with ISP_TARGET_MODE defined. :) -- Sean Bruno MiraLink Corporation 6015 NE 80th Ave, Ste 100 Portland, OR 97218 Phone 503-621-5143 Fax 503-621-5199 MSN: sbruno@miralink.com Google: seanwbruno@gmail.com Yahoo: sean_bruno@yahoo.com From pisymbol at gmail.com Thu Aug 28 21:59:55 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Thu Aug 28 22:00:11 2008 Subject: isp(4) - kernel panic on initialization of driver In-Reply-To: <3c0b01820808271333l34ead8ele99daab695baf667@mail.gmail.com> References: <13710393234.20080826164158@connection.ca> <48B46EE1.8060408@samsco.org> <3c0b01820808270743n5fd40995u6e9506b772f2b03c@mail.gmail.com> <86689256.20080827112751@connection.ca> <3c0b01820808271333l34ead8ele99daab695baf667@mail.gmail.com> Message-ID: <3c0b01820808281459r4990ec4exdb2b4906b4b711f6@mail.gmail.com> On Wed, Aug 27, 2008 at 4:33 PM, Alexander Sack wrote: > On Wed, Aug 27, 2008 at 11:27 AM, Ross wrote: >> Apologies if it didn't come across in my previous message, that's >> basically what I'm saying too. I have about a 50/50 chance of the >> card booting successfully, and when it does, the output from the debug >> lines is different, and it looks like the ASYNC event is executing >> before it's time. (See the end of this message for the full output >> from isp[01]: for a clean boot, since it's a long output.) > > Yea it looks like the loop down timer kthread gets started further > down isp_attach() which I believe maybe the issue. I got to look at > it some more. Ross, can you please try this patch? I don't have access to the external SAN right now so if this blows up, I'm apologize in advance. I'm not 100% sure about what is exactly going on without my hardware in front of me. However, it would seem that the LIP events that are enabled in isp_init() are coming in before the isp_ldt (loop down timer) and other things are initialized. I believe interrupts should be disabled until the intrhook gets called before root is mounted. That's what I *think* the author intended. Can you try the following small patch: --- isp_freebsd.c.0 2008-08-28 13:54:27.000000000 -0400 +++ isp_freebsd.c 2008-08-28 13:48:45.000000000 -0400 @@ -231,7 +231,6 @@ if (isp->isp_role != ISP_ROLE_NONE) { isp->isp_state = ISP_RUNSTATE; - ISP_ENABLE_INTS(isp); } if (isplist == NULL) { isplist = isp; I just don't see why ISP_ENABLE_INTS should be called since isp_intr_enable() has been established as the config_intrhook to be called before root mounts. One of the things I noticed is that the isp driver actually handles LIP events as they come in INSTEAD of first doing all the loop/fabric enumeration up front (during attach time which is what other OSes I believe do). This means when the card finally does come up and we are ready to go, it gets very noisy very fast until the fabric settles down. Hence all those portdb change messages during normal bootup! Give this a shot and let me know (either patch or edit the file, its one line, rebuild, reboot etc.). I tried it on my card with no devices attached so I know it works on my system! j/k Again, I'm stabbing in the dark a little but I'm curious if this prevents the panic. -aps From erich at fuujinnetworks.com Thu Aug 28 22:25:56 2008 From: erich at fuujinnetworks.com (Fuujin Networks LLC) Date: Thu Aug 28 22:26:02 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> References: <48B4CF57.30603@fuujinnetworks.com> <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> Message-ID: <48B733CF.5000105@fuujinnetworks.com> Alex: Thanks for your interest! Hope the following is of use to you. If you would like, I can put the dump file (~60MB) in folder accessible via the web. Here's the page fault: [snip] (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): getccb 0xc48a4a00 (targ0:isp0:0:0:0): Sent ATIO/INOT (0x2825c7d0) (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): getccb 0xc48a4900 (targ0:isp0:0:0:0): Sent ATIO/INOT (0x28259e80) (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): getccb 0xc48a4800 (targ0:isp0:0:0:0): Sent ATIO/INOT (0x2825c860) (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): getccb 0xc48a4700 (targ0:isp0:0:0:0): Sent ATIO/INOT (0x28259f20) (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): getccb 0xc48a4600 (targ0:isp0:0:0:0): Sent ATIO/INOT (0x2825c8f0) (targ0:isp0:0:0:0): targdone 0xc48a4700 (targ0:isp0:0:0:0): targread (targ0:isp0:0:0:0): targread ccb 0xc48a4700 (0x28259f20) (targ0:isp0:0:0:0): targreturnccb 0xc48a4700 cam_debug: targfreeccb descr 0xc48a2680 and cam_debug: freeing ccb 0xc48a4700 (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): Sending queued ccb 0x933 (0x2825e0c0) (targ0:isp0:0:0:0): targstart 0xc4947c00 (targ0:isp0:0:0:0): sendccb 0xc4947c00 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x4 fault code = supervisor read, page not present instruction pointer = 0x20:0xc05d3286 stack pointer = 0x28:0xe68e690c frame pointer = 0x28:0xe68e695c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 847 (scsi_target) trap number = 12 panic: page fault cpuid = 1 Uptime: 12m52s Physical memory: 1011 MB Dumping 57 MB:ATIO/INOT (0x28243670) (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): getccb 0xc4878700 (targ0:isp0:0:0:0): Sent ATIO/INOT (0x28244c00) (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): getccb 0xc4878600 (targ0:isp0:0:0:0): Sent ATIO/INOT (0x28243700) (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): getccb 0xc487c900 (targ0:isp0:0:0:0): Sent ATIO/INOT (0x28244ca0) (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): getccb 0xc487c800 (targ0:isp0:0:0:0): Sent ATIO/INOT (0x28243790) (targ0:isp0:0:0:0): write - uio_resid 4 (targ0:isp0:0:0:0): getccb 0xc487c700 [snip] Here are the relevant lines in the kernel. Everything else is stock. [snip] device isp # Qlogic family device ispfw # Firmware for QLogic HBAs options ISP_TARGET_MODE # for ISP cards in target mode device targ # SCSI Target device device targbh # SCSI Target Black Hole options CAMDEBUG options VFS_AIO [snip] Here is the relevant output from dmseg: [snip] FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 2 ioapic0 irqs 0-23 on motherboard registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set isp0: port 0xc000-0xc0ff mem 0xe7103000-0xe7103fff irq 16 at device 8.0 on pci0 firmware_get: failed to load firmware image isp_2300_it isp0: [ITHREAD] isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 isp0: target notify code 0x1007 isp0: target notify code 0x1007 isp0: target notify code 0x1006 isp0: target notify code 0x1007 isp0: target notify code 0x1008 (targbh0:isp0:0:-1:-1): Target Mode Enabled isp0: target notify code 0x1007 isp0: target notify code 0x1007 isp0: target notify code 0x1006 isp0: target notify code 0x1007 isp0: target notify code 0x1006 isp0: target notify code 0x1007 [snip] I'm a bit puzzled by the firmware_get failed line above. I suspect this may be the problem, but I have not been able to resolve it. I've tried disabling the bios on the FC cards, as well as messing with almost every other conceivable option, but the same error appears. Thoughts? Erich M. Jenkins Fuujin Networks, LLC PO Box 792 Brainerd, MN 56401 (p) 218-824-5038 (f) 218-824-7516 "You should never, never doubt what no one is sure about." -- Gene Wilder Alexander Sack wrote: > On Tue, Aug 26, 2008 at 11:51 PM, Fuujin Networks LLC > wrote: >> I've run into a snag with our SAN and I'm hoping someone out there can shed >> some light on the glass, as it were. We're trying to use scsi_target mode >> with a pair of QLogic ISP2310 2GB fibre-channel cards in a Point-to-Point >> topology. These cards will NOT be part of a switch fabric. I started out >> with a quad port card, and when I rescanned the SCSI bus on the initiating >> end of the loop, the target machine tanked. >> The filer dumps core, reboots, and reproduces the result faithfully. > > How about posting some stack traces, dmesg output, etc. etc. about how it tanks? > > Thanks! > > -aps From westr at connection.ca Fri Aug 29 14:36:22 2008 From: westr at connection.ca (Ross) Date: Fri Aug 29 14:36:30 2008 Subject: isp(4) - kernel panic on initialization of driver In-Reply-To: <3c0b01820808271333l34ead8ele99daab695baf667@mail.gmail.com> References: <13710393234.20080826164158@connection.ca> <48B46EE1.8060408@samsco.org> <3c0b01820808270743n5fd40995u6e9506b772f2b03c@mail.gmail.com> <86689256.20080827112751@connection.ca> <3c0b01820808271333l34ead8ele99daab695baf667@mail.gmail.com> Message-ID: <34442830.20080829103621@connection.ca> AS> which will enable the debugger. Then boot with a "-v" and when you AS> get into a panic you should fall into the kernel debugger. At the AS> prompt do a "t" and copy the output (you don't have to copy the AS> addresses just the stack trace). I think I can imagine what it looks AS> like but it would be better if you provided it. Here you go - a rather large pain in the butt, since HP's ilom doesn't allow for cut'n'paste from the virtual console (even in serial mode). Ugh. Hope that it helps some more. This kernel has been built using the 1 line patch you gave me (removing the ISP_ENABLE_INTS call). Cheers, Ross. -= ...[clipped]... stack pointer = 0x20:0xc0af63cc frame pointer = 0x20:0xc0af63cc code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (swapper) [thread pid 0 tid 0 ] Stopped at xpt_freeze_simq+0x6: movl 0x58(%ecx),%eax db> t Tracing pid 0 tid 0 td 0xc079fd30 xpt_freeze_simq(0,1,c06f7a16,c06f800f,c06f800f,...) at xpt_freeze_simq+0x6 isp_freeze_loopdown(c81d0e00,2,c06f800f,0,c0cf1f6c,...) at isp_freeze_loopdown+0x42 isp_async(c81d0e00,6,0,14e1,2001d5c3,...) at isp_async+0xa72 isp_intr(c81d0e00,8012,1,8014,c0af6718,...) at isp_intr+0xbc7 isp_mbox_wait_complete(c81d0e00,c0af67e8,50000000,0,8,...) at isp_mbox_wait_complete+0x120 isp_mboxcmd(c0af67e8,24,40000000,c827c380,c0af67cc,...) at isp_mboxcmd+0x1ef isp_reset(c81d0e00,c827c380,e,c0af6880,c02d6ce0,...) at isp_reset+0xe9 isp_pci_attach(c827c380,c827c380,ffffffff,c0708a0a,80000000,...) at isp_pci_attach+0x1899 device_attach(c827c380,c827c380,1,c827c380,c8263980,...) at device_attach+0x36f device_probe_and_attach(c827c380,c8266800,c0af6944,c08b13a1,c8263980,...) at device_probe_and_attach+0xdd bus_generic_attach(c8263980,c816c600,1,c08b0e80,c8263980,...) at bus_generic_attach+0x19 ...[much more clipped]... -= If any number seems 'off', let me know, as it could be my typing skills. -- From pisymbol at gmail.com Fri Aug 29 15:22:26 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Fri Aug 29 15:22:32 2008 Subject: isp(4) - kernel panic on initialization of driver In-Reply-To: <34442830.20080829103621@connection.ca> References: <13710393234.20080826164158@connection.ca> <48B46EE1.8060408@samsco.org> <3c0b01820808270743n5fd40995u6e9506b772f2b03c@mail.gmail.com> <86689256.20080827112751@connection.ca> <3c0b01820808271333l34ead8ele99daab695baf667@mail.gmail.com> <34442830.20080829103621@connection.ca> Message-ID: <3c0b01820808290822tce5619bie11b8e97fe9a9062@mail.gmail.com> On Fri, Aug 29, 2008 at 10:36 AM, Ross wrote: > > AS> which will enable the debugger. Then boot with a "-v" and when you > AS> get into a panic you should fall into the kernel debugger. At the > AS> prompt do a "t" and copy the output (you don't have to copy the > AS> addresses just the stack trace). I think I can imagine what it looks > AS> like but it would be better if you provided it. > > Here you go - a rather large pain in the butt, since HP's ilom doesn't > allow for cut'n'paste from the virtual console (even in serial mode). > Ugh. Hope that it helps some more. > > This kernel has been built using the 1 line patch you gave me > (removing the ISP_ENABLE_INTS call). Thanks Ross. Unfortunately, seems like your problem is even before we get to isp_attach()! > -= > ...[clipped]... > stack pointer = 0x20:0xc0af63cc > frame pointer = 0x20:0xc0af63cc > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (swapper) > [thread pid 0 tid 0 ] > Stopped at xpt_freeze_simq+0x6: movl 0x58(%ecx),%eax > db> t > Tracing pid 0 tid 0 td 0xc079fd30 > xpt_freeze_simq(0,1,c06f7a16,c06f800f,c06f800f,...) at xpt_freeze_simq+0x6 > isp_freeze_loopdown(c81d0e00,2,c06f800f,0,c0cf1f6c,...) at isp_freeze_loopdown+0x42 > isp_async(c81d0e00,6,0,14e1,2001d5c3,...) at isp_async+0xa72 > isp_intr(c81d0e00,8012,1,8014,c0af6718,...) at isp_intr+0xbc7 > isp_mbox_wait_complete(c81d0e00,c0af67e8,50000000,0,8,...) at isp_mbox_wait_complete+0x120 > isp_mboxcmd(c0af67e8,24,40000000,c827c380,c0af67cc,...) at isp_mboxcmd+0x1ef > isp_reset(c81d0e00,c827c380,e,c0af6880,c02d6ce0,...) at isp_reset+0xe9 > isp_pci_attach(c827c380,c827c380,ffffffff,c0708a0a,80000000,...) at isp_pci_attach+0x1899 > device_attach(c827c380,c827c380,1,c827c380,c8263980,...) at device_attach+0x36f > device_probe_and_attach(c827c380,c8266800,c0af6944,c08b13a1,c8263980,...) at device_probe_and_attach+0xdd > bus_generic_attach(c8263980,c816c600,1,c08b0e80,c8263980,...) at bus_generic_attach+0x19 > ...[much more clipped]... > -= > > If any number seems 'off', let me know, as it could be my typing > skills. Don't worry about that Ross. Just the stack trace of functions is very enlightening! Looks like you got to isp_reset() before things screwed up. The issue is the CAM stuff is not initialized until isp_attach() I believe so that's why things are screwy (I don't think there is a simq allocated let alone freeze at this point). Give me a sec to look a this some more....bottom line is isp should not be servicing async interrupts until its absolutely ready! -aps From pisymbol at gmail.com Fri Aug 29 16:14:03 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Fri Aug 29 16:14:10 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <48B733CF.5000105@fuujinnetworks.com> References: <48B4CF57.30603@fuujinnetworks.com> <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> <48B733CF.5000105@fuujinnetworks.com> Message-ID: <3c0b01820808290914s638c970ejeae1d4f8c8c8a9d9@mail.gmail.com> On Thu, Aug 28, 2008 at 7:25 PM, Fuujin Networks LLC wrote: > > [snip] > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > ioapic0: Changing APIC ID to 2 > ioapic0 irqs 0-23 on motherboard > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > isp0: port 0xc000-0xc0ff mem > 0xe7103000-0xe7103fff irq 16 at device 8.0 on pci0 > firmware_get: failed to load firmware image isp_2300_it > isp0: [ITHREAD] > isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 > isp0: target notify code 0x1007 > isp0: target notify code 0x1007 > isp0: target notify code 0x1006 > isp0: target notify code 0x1007 > isp0: target notify code 0x1008 > (targbh0:isp0:0:-1:-1): Target Mode Enabled > isp0: target notify code 0x1007 > isp0: target notify code 0x1007 > isp0: target notify code 0x1006 > isp0: target notify code 0x1007 > isp0: target notify code 0x1006 > isp0: target notify code 0x1007 > [snip] > > I'm a bit puzzled by the firmware_get failed line above. I suspect this may > be the problem, but I have not been able to resolve it. I've tried disabling > the bios on the FC cards, as well as messing with almost every other > conceivable option, but the same error appears. Thoughts? Yes, its a bug in the ISP driver. If you are in target mode, it tries to load the isp_XXX_it version of the RISC code. I *think* the old SCSI cards had two separate firmwares for target and initiator modes (currently if you look at ispfw, there is the 1040, 1080, and 12160_it firmwares). Try this patch: --- isp_pci.c 2008-08-29 07:58:08.000000000 -0400 +++ isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 @@ -1039,7 +1039,7 @@ } isp->isp_osinfo.fw = NULL; - if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { + if (isp->isp_role & ISP_ROLE_TARGET) { snprintf(fwname, sizeof (fwname), "isp_%04x_it", did); isp->isp_osinfo.fw = firmware_get(fwname); } That will fix the above error. The bad news is that this won't fix your problem since you DID load the 3.3.19 firmware since the next line will get the isp_2300 firmware and things will proceed normally down in isp_reset() (where the load actually happens!). So you really need to enable: options DDB options KDB and get a stack trace so when the machine panics you can do a "bt" and print the output (forget about the addresses, just the function calls). Also make sure the BIOS is configured to enable target mode (I forgot if the 2300 had a separate BIOS tunable for that). Let us know, -aps From pisymbol at gmail.com Fri Aug 29 16:15:27 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Fri Aug 29 16:15:33 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <3c0b01820808290914s638c970ejeae1d4f8c8c8a9d9@mail.gmail.com> References: <48B4CF57.30603@fuujinnetworks.com> <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> <48B733CF.5000105@fuujinnetworks.com> <3c0b01820808290914s638c970ejeae1d4f8c8c8a9d9@mail.gmail.com> Message-ID: <3c0b01820808290915t4e964182y784c215e28977252@mail.gmail.com> On Fri, Aug 29, 2008 at 12:14 PM, Alexander Sack wrote: > On Thu, Aug 28, 2008 at 7:25 PM, Fuujin Networks LLC > wrote: >> >> [snip] >> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs >> cpu0 (BSP): APIC ID: 0 >> cpu1 (AP): APIC ID: 1 >> ioapic0: Changing APIC ID to 2 >> ioapic0 irqs 0-23 on motherboard >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> isp0: port 0xc000-0xc0ff mem >> 0xe7103000-0xe7103fff irq 16 at device 8.0 on pci0 >> firmware_get: failed to load firmware image isp_2300_it >> isp0: [ITHREAD] >> isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1006 >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1008 >> (targbh0:isp0:0:-1:-1): Target Mode Enabled >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1006 >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1006 >> isp0: target notify code 0x1007 >> [snip] >> >> I'm a bit puzzled by the firmware_get failed line above. I suspect this may >> be the problem, but I have not been able to resolve it. I've tried disabling >> the bios on the FC cards, as well as messing with almost every other >> conceivable option, but the same error appears. Thoughts? > > Yes, its a bug in the ISP driver. If you are in target mode, it tries > to load the isp_XXX_it version of the RISC code. I *think* the old > SCSI cards had two separate firmwares for target and initiator modes > (currently if you look at ispfw, there is the 1040, 1080, and 12160_it > firmwares). > > Try this patch: > > --- isp_pci.c 2008-08-29 07:58:08.000000000 -0400 > +++ isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 > @@ -1039,7 +1039,7 @@ > } > > isp->isp_osinfo.fw = NULL; > - if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { > + if (isp->isp_role & ISP_ROLE_TARGET) { > snprintf(fwname, sizeof (fwname), "isp_%04x_it", did); > isp->isp_osinfo.fw = firmware_get(fwname); > } Whoops! Its reversed! --- isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 +++ isp_pci.c 2008-08-29 07:58:08.000000000 -0400 @@ -1039,7 +1039,7 @@ } isp->isp_osinfo.fw = NULL; - if (isp->isp_role & ISP_ROLE_TARGET) { + if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { snprintf(fwname, sizeof (fwname), "isp_%04x_it", did); isp->isp_osinfo.fw = firmware_get(fwname); } Sorry about that! -aps From pisymbol at gmail.com Fri Aug 29 16:51:55 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Fri Aug 29 16:52:01 2008 Subject: [ISP] QLA2432 Target Mode Broken In-Reply-To: <48B6EF1A.1040805@miralink.com> References: <48B6E19A.7050603@miralink.com> <3c0b01820808281059k3c33e352g6be72f02817e8e6a@mail.gmail.com> <48B6EF1A.1040805@miralink.com> Message-ID: <3c0b01820808290951s6a3a8ebuf6ea501308ed91c3@mail.gmail.com> On Thu, Aug 28, 2008 at 2:31 PM, Sean Bruno wrote: > Alexander Sack wrote: >> >> On Thu, Aug 28, 2008 at 1:34 PM, Sean Bruno wrote: >> >>> >>> I tried putting a 2432 into target mode this week and noted that the >>> system >>> threw a pretty nice panic and thought I would post the output here. >>> Reviewing the 4G documentation from Qlogic, it looks like they've >>> substantially changed the target mode interface, so I'm not surprised >>> that >>> there's some work to do. If anyone has any patches they'd like me to >>> test, >>> I'm open to integration: >>> >> >> Did you rebuild the isp driver with -DISP_TARGET_MODE defined? I only >> mention this because the output below seems like you twiddled the >> "role" hint instead of actually recompile the driver? >> >> -aps >> > > Ah, yes, that's a little magic I was trying ... sorry about that. > > Yes, I definitely compiled with ISP_TARGET_MODE defined. :) Yea sorry, I just was checking. I don't have a clue right now why you are dying but some else is seeing similar nastiness in target mode. Minimally you should file a bug. How did you setup your box, how do you reproduce etc. etc.? thanks! -aps From sbruno at miralink.com Fri Aug 29 17:54:36 2008 From: sbruno at miralink.com (Sean Bruno) Date: Fri Aug 29 17:54:41 2008 Subject: [ISP] QLA2432 Target Mode Broken In-Reply-To: <3c0b01820808290951s6a3a8ebuf6ea501308ed91c3@mail.gmail.com> References: <48B6E19A.7050603@miralink.com> <3c0b01820808281059k3c33e352g6be72f02817e8e6a@mail.gmail.com> <48B6EF1A.1040805@miralink.com> <3c0b01820808290951s6a3a8ebuf6ea501308ed91c3@mail.gmail.com> Message-ID: <48B837DB.1000903@miralink.com> Alexander Sack wrote: > On Thu, Aug 28, 2008 at 2:31 PM, Sean Bruno wrote: > >> Alexander Sack wrote: >> >>> On Thu, Aug 28, 2008 at 1:34 PM, Sean Bruno wrote: >>> >>> >>>> I tried putting a 2432 into target mode this week and noted that the >>>> system >>>> threw a pretty nice panic and thought I would post the output here. >>>> Reviewing the 4G documentation from Qlogic, it looks like they've >>>> substantially changed the target mode interface, so I'm not surprised >>>> that >>>> there's some work to do. If anyone has any patches they'd like me to >>>> test, >>>> I'm open to integration: >>>> >>>> >>> Did you rebuild the isp driver with -DISP_TARGET_MODE defined? I only >>> mention this because the output below seems like you twiddled the >>> "role" hint instead of actually recompile the driver? >>> >>> -aps >>> >>> >> Ah, yes, that's a little magic I was trying ... sorry about that. >> >> Yes, I definitely compiled with ISP_TARGET_MODE defined. :) >> > > Yea sorry, I just was checking. I don't have a clue right now why you > are dying but some else is seeing similar nastiness in target mode. > Minimally you should file a bug. > > How did you setup your box, how do you reproduce etc. etc.? > > thanks! > > -aps > Well, I put a 2432 into my box and recompiled with target mode enabled. Nothing fancy. :) A 23XX card in it's place works just fine. I believe, due to the architecture difference between the 4G/8G and 2G cards that it just doesn't work right now. The 4G/8G architecture does nothing in target mode except DMA the incoming data. the 2G cards do some things for the driver. I was hoping that someone has some patches in their trees just lying around that I could look over and test for 4G target mode support. :) -- Sean Bruno MiraLink Corporation 6015 NE 80th Ave, Ste 100 Portland, OR 97218 Phone 503-621-5143 Fax 503-621-5199 MSN: sbruno@miralink.com Google: seanwbruno@gmail.com Yahoo: sean_bruno@yahoo.com From westr at connection.ca Fri Aug 29 19:17:51 2008 From: westr at connection.ca (Ross) Date: Fri Aug 29 19:17:57 2008 Subject: isp(4) - kernel panic on initialization of driver In-Reply-To: <3c0b01820808290822tce5619bie11b8e97fe9a9062@mail.gmail.com> References: <13710393234.20080826164158@connection.ca> <48B46EE1.8060408@samsco.org> <3c0b01820808270743n5fd40995u6e9506b772f2b03c@mail.gmail.com> <86689256.20080827112751@connection.ca> <3c0b01820808271333l34ead8ele99daab695baf667@mail.gmail.com> <34442830.20080829103621@connection.ca> <3c0b01820808290822tce5619bie11b8e97fe9a9062@mail.gmail.com> Message-ID: <08661720.20080829151750@connection.ca> AS> Give me a sec to look a this some more....bottom line is isp AS> should not be servicing async interrupts until its absolutely AS> ready! Okay, I've done a bit more digging on my side here, and have come up against a small wall - The trace shows: > isp_freeze_loopdown(c81d0e00,2,c06f800f,0,c0cf1f6c,...) at isp_freeze_loopdown+0x42 > isp_async(c81d0e00,6,0,14e1,2001d5c3,...) at isp_async+0xa72 > isp_intr(c81d0e00,8012,1,8014,c0af6718,...) at isp_intr+0xbc7 > isp_mbox_wait_complete(c81d0e00,c0af67e8,50000000,0,8,...) at isp_mbox_wait_complete+0x120 But what's missing in the trace is the call to isp_async() from isp_intr(). What I have found is the call to isp_parse_async() (see line 4560 in isp.c) which in turn calls isp_async(). Setting DEBUG to 0x14F shows the extra debug lines in there confirming the pass through [console output includes the "Async Mbox 0x8014" line before the mbox checkpoint]. I'm guessing that would be the spot to add a check against isp_state, but am not sure whether to just return from the function, or to do the goto jump to 'out' for the cleanup. Let me know if I'm on track. Cheers, Ross. -- From pisymbol at gmail.com Fri Aug 29 22:15:38 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Fri Aug 29 22:15:44 2008 Subject: isp(4) - kernel panic on initialization of driver In-Reply-To: <08661720.20080829151750@connection.ca> References: <13710393234.20080826164158@connection.ca> <48B46EE1.8060408@samsco.org> <3c0b01820808270743n5fd40995u6e9506b772f2b03c@mail.gmail.com> <86689256.20080827112751@connection.ca> <3c0b01820808271333l34ead8ele99daab695baf667@mail.gmail.com> <34442830.20080829103621@connection.ca> <3c0b01820808290822tce5619bie11b8e97fe9a9062@mail.gmail.com> <08661720.20080829151750@connection.ca> Message-ID: <3c0b01820808291515j759236e6h262c533846587d57@mail.gmail.com> On Fri, Aug 29, 2008 at 3:17 PM, Ross wrote: > AS> Give me a sec to look a this some more....bottom line is isp > AS> should not be servicing async interrupts until its absolutely > AS> ready! > > Okay, I've done a bit more digging on my side here, and have come up > against a small wall - > > The trace shows: > >> isp_freeze_loopdown(c81d0e00,2,c06f800f,0,c0cf1f6c,...) at isp_freeze_loopdown+0x42 >> isp_async(c81d0e00,6,0,14e1,2001d5c3,...) at isp_async+0xa72 >> isp_intr(c81d0e00,8012,1,8014,c0af6718,...) at isp_intr+0xbc7 >> isp_mbox_wait_complete(c81d0e00,c0af67e8,50000000,0,8,...) at isp_mbox_wait_complete+0x120 > > But what's missing in the trace is the call to isp_async() from > isp_intr(). What I have found is the call to isp_parse_async() (see line > 4560 in isp.c) which in turn calls isp_async(). > > Setting DEBUG to 0x14F shows the extra debug lines in there confirming > the pass through [console output includes the "Async Mbox 0x8014" > line before the mbox checkpoint]. > > I'm guessing that would be the spot to add a check against isp_state, > but am not sure whether to just return from the function, or to do the > goto jump to 'out' for the cleanup. > > Let me know if I'm on track. I think your doing some great work but I don't think this is the *right* direction to take. The bottom line is the ISP should have interrupts disabled until it completes a full reset and loads the firmware, period. You shouldn't have to ignore ASYNC events during a reset - that doesn't make sense to me....yet....! Can we try something else: @@ -1192,6 +1192,8 @@ isp->isp_touched = 1; } + ISP_DISABLE_INTS(isp); + /* * Make sure we're in reset state. */ --- isp.c.0 2008-08-29 13:35:01.000000000 -0400 +++ isp.c 2008-08-29 14:15:40.000000000 -0400 @@ -226,8 +226,6 @@ isp->isp_touched = 1; } - ISP_DISABLE_INTS(isp); - /* * Pick an initial maxcmds value which will be used * to allocate xflist pointer space. It may be changed @@ -684,7 +682,8 @@ /* * Do MD specific post initialization */ - ISP_RESET1(isp); + if (!IS_24XX(isp)) + ISP_RESET1(isp); /* * Wait for everything to finish firing up. --- isp_freebsd.c.0 2008-08-29 14:05:05.000000000 -0400 +++ isp_freebsd.c 2008-08-29 14:05:32.000000000 -0400 @@ -231,6 +231,7 @@ if (isp->isp_role != ISP_ROLE_NONE) { isp->isp_state = ISP_RUNSTATE; + ISP_ENABLE_INTS(isp); } if (isplist == NULL) { isplist = isp; I wanted to put back that line that we removed so we test one thing at a time. I so wish I could reproduce your exact panic but I can't!!! I've tried about a dozen different ways but I just can't. I'm trying to ignore ALL ASYNC's until after we complete the isp_reset|init|attach cycle and let the intrhook enable interrupts and start the enumeration stuff (at that point simq's have been enabled, bus registered etc.). The mailbox commands should be ok since we use polling to complete them anyway (I have to verify that). I just tried this on my box to verify that my RAID FC array gets enumerated and the driver doesn't panic (its the best I can do right now). Btw, this wouldn't be the final patch but if its effective we are on the right track! :D -aps From pisymbol at gmail.com Fri Aug 29 22:43:43 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Fri Aug 29 22:43:51 2008 Subject: [ISP] QLA2432 Target Mode Broken In-Reply-To: <48B837DB.1000903@miralink.com> References: <48B6E19A.7050603@miralink.com> <3c0b01820808281059k3c33e352g6be72f02817e8e6a@mail.gmail.com> <48B6EF1A.1040805@miralink.com> <3c0b01820808290951s6a3a8ebuf6ea501308ed91c3@mail.gmail.com> <48B837DB.1000903@miralink.com> Message-ID: <3c0b01820808291543q694c9692jc0f1e7c922344f29@mail.gmail.com> If I get a chance, maybe I can try that and see what happens to me!! -aps On Fri, Aug 29, 2008 at 1:54 PM, Sean Bruno wrote: > Alexander Sack wrote: >> >> On Thu, Aug 28, 2008 at 2:31 PM, Sean Bruno wrote: >> >>> >>> Alexander Sack wrote: >>> >>>> >>>> On Thu, Aug 28, 2008 at 1:34 PM, Sean Bruno wrote: >>>> >>>> >>>>> >>>>> I tried putting a 2432 into target mode this week and noted that the >>>>> system >>>>> threw a pretty nice panic and thought I would post the output here. >>>>> Reviewing the 4G documentation from Qlogic, it looks like they've >>>>> substantially changed the target mode interface, so I'm not surprised >>>>> that >>>>> there's some work to do. If anyone has any patches they'd like me to >>>>> test, >>>>> I'm open to integration: >>>>> >>>>> >>>> >>>> Did you rebuild the isp driver with -DISP_TARGET_MODE defined? I only >>>> mention this because the output below seems like you twiddled the >>>> "role" hint instead of actually recompile the driver? >>>> >>>> -aps >>>> >>>> >>> >>> Ah, yes, that's a little magic I was trying ... sorry about that. >>> >>> Yes, I definitely compiled with ISP_TARGET_MODE defined. :) >>> >> >> Yea sorry, I just was checking. I don't have a clue right now why you >> are dying but some else is seeing similar nastiness in target mode. >> Minimally you should file a bug. >> >> How did you setup your box, how do you reproduce etc. etc.? >> >> thanks! >> >> -aps >> > > Well, I put a 2432 into my box and recompiled with target mode enabled. > Nothing fancy. > > :) > > A 23XX card in it's place works just fine. > I believe, due to the architecture difference between the 4G/8G and 2G cards > that it just doesn't work right now. > The 4G/8G architecture does nothing in target mode except DMA the incoming > data. the 2G cards do some things > for the driver. > > I was hoping that someone has some patches in their trees just lying around > that I could look over and test for > 4G target mode support. :) > > -- > Sean Bruno > MiraLink Corporation > 6015 NE 80th Ave, Ste 100 > Portland, OR 97218 > Phone 503-621-5143 > Fax 503-621-5199 > MSN: sbruno@miralink.com > Google: seanwbruno@gmail.com > Yahoo: sean_bruno@yahoo.com > > From erich at fuujinnetworks.com Sat Aug 30 05:29:01 2008 From: erich at fuujinnetworks.com (Fuujin Networks LLC) Date: Sat Aug 30 05:29:07 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <3c0b01820808290915t4e964182y784c215e28977252@mail.gmail.com> References: <48B4CF57.30603@fuujinnetworks.com> <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> <48B733CF.5000105@fuujinnetworks.com> <3c0b01820808290914s638c970ejeae1d4f8c8c8a9d9@mail.gmail.com> <3c0b01820808290915t4e964182y784c215e28977252@mail.gmail.com> Message-ID: <48B8E879.7020809@fuujinnetworks.com> Alex: Thanks very much for the patch. Unfortunately, I ended up with a similar result as seen below. Just for grins, I tried the patch on a 64-bit system (AMD64) to see if there was a difference based on which architecture is used for the target. No difference there either; still dumps core and reboots. The upside I would think is that both branches seem to be in sync. I do have a sparc64 box here if you'd like to see what happens in that world (haven't tested it yet). [snip] (targ0:isp0:0:2:0): targdone 0xffffff0001ddda00 (targ0:isp0:0:2:0): targread (targ0:isp0:0:2:0): targread ccb 0xffffff0001ddda00 (0x800b7fe20) (targ0:isp0:0:2:0): targreturnccb 0xffffff0001ddda00 cam_debug: targfreeccb descr 0xffffff0001dda1c0 and cam_debug: freeing ccb 0xffffff0001ddda00 (targ0:isp0:0:2:0): write - uio_resid 8 (targ0:isp0:0:2:0): Sending queued ccb 0x933 (0x800b85040) (targ0:isp0:0:2:0): targstart 0xffffff0001369000 (targ0:isp0:0:2:0): sendccb 0xffffff0001369000 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x8 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff8025d2e8 stack pointer = 0x10:0xffffffffae3d06f0 frame pointer = 0x10:0xffffffff80a42000 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 783 (scsi_target) trap number = 12 panic: page fault cpuid = 0 Uptime: 7m21s Physical memory: 4021 MB Dumping 364 MB:: write - uio_resid 8 (targ0:isp0:0:2:0): getccb 0xffffff0001db7c00 (targ0:isp0:0:2:0): Sent ATIO/INOT (0x800b61a10) (targ0:isp0:0:2:0): write - uio_resid 8 (targ0:isp0:0:2:0): getccb 0xffffff0001db7b00 [snip] Seems to be nearly the same result in loading the firmware: [snip] registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set [snip] isp0: port 0x3000-0x30ff mem 0xfe020000-0xfe020fff irq 25 at device 1.0 on pci2 firmware_get: failed to load firmware image isp_2300_it isp0: [ITHREAD] isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 [snip] isp0: target notify code 0x1007 isp0: target notify code 0x1007 isp0: target notify code 0x1006 isp0: target notify code 0x1007 isp0: target notify code 0x1008 (targbh0:isp0:0:-1:-1): Target Mode Enabled [snip] It doesn't appear that the firmware "isp_2300_it" either exists or possibly isn't named properly on the target machine (or the initiator for that matter). However, there do not seem to be any problems loading the firmware for the card when it's not in target mode. From everything I've read, it looks like the firmware needs to be loaded via the kernel device option "ispfw". If for nothing other than my understanding, is there some reason we're not loading the firmware resident on the card? Thanks very much for your help. This is a bit of a head scratcher for me... Please let me know if SSH access to either boxes would help you and I'll be happy to arrange that. Erich M. Jenkins Fuujin Networks, LLC PO Box 792 Brainerd, MN 56401 (p) 218-824-5038 (f) 218-824-7516 "You should never, never doubt what no one is sure about." -- Gene Wilder Alexander Sack wrote: > On Fri, Aug 29, 2008 at 12:14 PM, Alexander Sack wrote: >> On Thu, Aug 28, 2008 at 7:25 PM, Fuujin Networks LLC >> wrote: >>> [snip] >>> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs >>> cpu0 (BSP): APIC ID: 0 >>> cpu1 (AP): APIC ID: 1 >>> ioapic0: Changing APIC ID to 2 >>> ioapic0 irqs 0-23 on motherboard >>> registered firmware set >>> registered firmware set >>> registered firmware set >>> registered firmware set >>> registered firmware set >>> registered firmware set >>> registered firmware set >>> registered firmware set >>> registered firmware set >>> registered firmware set >>> registered firmware set >>> isp0: port 0xc000-0xc0ff mem >>> 0xe7103000-0xe7103fff irq 16 at device 8.0 on pci0 >>> firmware_get: failed to load firmware image isp_2300_it >>> isp0: [ITHREAD] >>> isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 >>> isp0: target notify code 0x1007 >>> isp0: target notify code 0x1007 >>> isp0: target notify code 0x1006 >>> isp0: target notify code 0x1007 >>> isp0: target notify code 0x1008 >>> (targbh0:isp0:0:-1:-1): Target Mode Enabled >>> isp0: target notify code 0x1007 >>> isp0: target notify code 0x1007 >>> isp0: target notify code 0x1006 >>> isp0: target notify code 0x1007 >>> isp0: target notify code 0x1006 >>> isp0: target notify code 0x1007 >>> [snip] >>> >>> I'm a bit puzzled by the firmware_get failed line above. I suspect this may >>> be the problem, but I have not been able to resolve it. I've tried disabling >>> the bios on the FC cards, as well as messing with almost every other >>> conceivable option, but the same error appears. Thoughts? >> Yes, its a bug in the ISP driver. If you are in target mode, it tries >> to load the isp_XXX_it version of the RISC code. I *think* the old >> SCSI cards had two separate firmwares for target and initiator modes >> (currently if you look at ispfw, there is the 1040, 1080, and 12160_it >> firmwares). >> >> Try this patch: >> >> --- isp_pci.c 2008-08-29 07:58:08.000000000 -0400 >> +++ isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 >> @@ -1039,7 +1039,7 @@ >> } >> >> isp->isp_osinfo.fw = NULL; >> - if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { >> + if (isp->isp_role & ISP_ROLE_TARGET) { >> snprintf(fwname, sizeof (fwname), "isp_%04x_it", did); >> isp->isp_osinfo.fw = firmware_get(fwname); >> } > > Whoops! Its reversed! > > --- isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 > +++ isp_pci.c 2008-08-29 07:58:08.000000000 -0400 > @@ -1039,7 +1039,7 @@ > } > > isp->isp_osinfo.fw = NULL; > - if (isp->isp_role & ISP_ROLE_TARGET) { > + if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { > snprintf(fwname, sizeof (fwname), "isp_%04x_it", did); > isp->isp_osinfo.fw = firmware_get(fwname); > } > > Sorry about that! > > -aps From sbruno at miralink.com Sat Aug 30 06:02:16 2008 From: sbruno at miralink.com (Sean Bruno) Date: Sat Aug 30 06:02:23 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <48B8E879.7020809@fuujinnetworks.com> References: <48B4CF57.30603@fuujinnetworks.com> <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> <48B733CF.5000105@fuujinnetworks.com> <3c0b01820808290914s638c970ejeae1d4f8c8c8a9d9@mail.gmail.com> <3c0b01820808290915t4e964182y784c215e28977252@mail.gmail.com> <48B8E879.7020809@fuujinnetworks.com> Message-ID: <48B8E265.4060001@miralink.com> Fuujin Networks LLC wrote: > Alex: > > Thanks very much for the patch. Unfortunately, I ended up with a > similar result as seen below. Just for grins, I tried the patch on a > 64-bit system (AMD64) to see if there was a difference based on which > architecture is used for the target. No difference there either; still > dumps core and reboots. The upside I would think is that both branches > seem to be in sync. I do have a sparc64 box here if you'd like to see > what happens in that world (haven't tested it yet). > > [snip] > (targ0:isp0:0:2:0): targdone 0xffffff0001ddda00 > (targ0:isp0:0:2:0): targread > (targ0:isp0:0:2:0): targread ccb 0xffffff0001ddda00 (0x800b7fe20) > (targ0:isp0:0:2:0): targreturnccb 0xffffff0001ddda00 > cam_debug: targfreeccb descr 0xffffff0001dda1c0 and > cam_debug: freeing ccb 0xffffff0001ddda00 > (targ0:isp0:0:2:0): write - uio_resid 8 > (targ0:isp0:0:2:0): Sending queued ccb 0x933 (0x800b85040) > (targ0:isp0:0:2:0): targstart 0xffffff0001369000 > (targ0:isp0:0:2:0): sendccb 0xffffff0001369000 > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x8 > fault code = supervisor read data, page not present > instruction pointer = 0x8:0xffffffff8025d2e8 > stack pointer = 0x10:0xffffffffae3d06f0 > frame pointer = 0x10:0xffffffff80a42000 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 783 (scsi_target) > trap number = 12 > panic: page fault > cpuid = 0 > Uptime: 7m21s > Physical memory: 4021 MB > Dumping 364 MB:: write - uio_resid 8 > (targ0:isp0:0:2:0): getccb 0xffffff0001db7c00 > (targ0:isp0:0:2:0): Sent ATIO/INOT (0x800b61a10) > (targ0:isp0:0:2:0): write - uio_resid 8 > (targ0:isp0:0:2:0): getccb 0xffffff0001db7b00 > [snip] > > > Seems to be nearly the same result in loading the firmware: > > [snip] > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > [snip] > isp0: port 0x3000-0x30ff mem > 0xfe020000-0xfe020fff irq 25 at device 1.0 on pci2 > firmware_get: failed to load firmware image isp_2300_it > isp0: [ITHREAD] > isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 > [snip] > isp0: target notify code 0x1007 > isp0: target notify code 0x1007 > isp0: target notify code 0x1006 > isp0: target notify code 0x1007 > isp0: target notify code 0x1008 > (targbh0:isp0:0:-1:-1): Target Mode Enabled > [snip] > > It doesn't appear that the firmware "isp_2300_it" either exists or > possibly isn't named properly on the target machine (or the initiator > for that matter). However, there do not seem to be any problems > loading the firmware for the card when it's not in target mode. > > From everything I've read, it looks like the firmware needs to be > loaded via the kernel device option "ispfw". If for nothing other than > my understanding, is there some reason we're not loading the firmware > resident on the card? > > Thanks very much for your help. This is a bit of a head scratcher for > me... > > Please let me know if SSH access to either boxes would help you and > I'll be happy to arrange that. > > > Erich M. Jenkins > Fuujin Networks, LLC > PO Box 792 > Brainerd, MN 56401 > (p) 218-824-5038 > (f) 218-824-7516 > > "You should never, never doubt what no one is sure about." > -- Gene Wilder > > Alexander Sack wrote: >> On Fri, Aug 29, 2008 at 12:14 PM, Alexander Sack >> wrote: >>> On Thu, Aug 28, 2008 at 7:25 PM, Fuujin Networks LLC >>> wrote: >>>> [snip] >>>> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs >>>> cpu0 (BSP): APIC ID: 0 >>>> cpu1 (AP): APIC ID: 1 >>>> ioapic0: Changing APIC ID to 2 >>>> ioapic0 irqs 0-23 on motherboard >>>> registered firmware set >>>> registered firmware set >>>> registered firmware set >>>> registered firmware set >>>> registered firmware set >>>> registered firmware set >>>> registered firmware set >>>> registered firmware set >>>> registered firmware set >>>> registered firmware set >>>> registered firmware set >>>> isp0: port 0xc000-0xc0ff mem >>>> 0xe7103000-0xe7103fff irq 16 at device 8.0 on pci0 >>>> firmware_get: failed to load firmware image isp_2300_it >>>> isp0: [ITHREAD] >>>> isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 >>>> isp0: target notify code 0x1007 >>>> isp0: target notify code 0x1007 >>>> isp0: target notify code 0x1006 >>>> isp0: target notify code 0x1007 >>>> isp0: target notify code 0x1008 >>>> (targbh0:isp0:0:-1:-1): Target Mode Enabled >>>> isp0: target notify code 0x1007 >>>> isp0: target notify code 0x1007 >>>> isp0: target notify code 0x1006 >>>> isp0: target notify code 0x1007 >>>> isp0: target notify code 0x1006 >>>> isp0: target notify code 0x1007 >>>> [snip] >>>> >>>> I'm a bit puzzled by the firmware_get failed line above. I suspect >>>> this may >>>> be the problem, but I have not been able to resolve it. I've tried >>>> disabling >>>> the bios on the FC cards, as well as messing with almost every other >>>> conceivable option, but the same error appears. Thoughts? >>> Yes, its a bug in the ISP driver. If you are in target mode, it tries >>> to load the isp_XXX_it version of the RISC code. I *think* the old >>> SCSI cards had two separate firmwares for target and initiator modes >>> (currently if you look at ispfw, there is the 1040, 1080, and 12160_it >>> firmwares). >>> >>> Try this patch: >>> >>> --- isp_pci.c 2008-08-29 07:58:08.000000000 -0400 >>> +++ isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 >>> @@ -1039,7 +1039,7 @@ >>> } >>> >>> isp->isp_osinfo.fw = NULL; >>> - if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { >>> + if (isp->isp_role & ISP_ROLE_TARGET) { >>> snprintf(fwname, sizeof (fwname), >>> "isp_%04x_it", did); >>> isp->isp_osinfo.fw = firmware_get(fwname); >>> } >> >> Whoops! Its reversed! >> >> --- isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 >> +++ isp_pci.c 2008-08-29 07:58:08.000000000 -0400 >> @@ -1039,7 +1039,7 @@ >> } >> >> isp->isp_osinfo.fw = NULL; >> - if (isp->isp_role & ISP_ROLE_TARGET) { >> + if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { >> snprintf(fwname, sizeof (fwname), "isp_%04x_it", did); >> isp->isp_osinfo.fw = firmware_get(fwname); >> } >> >> Sorry about that! >> >> -aps > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" Hrm...just to check, do you have the following in your loader.conf: ispfw_load="YES" ? -- Sean Bruno MiraLink Corporation 6015 NE 80th Ave, Ste 100 Portland, OR 97218 Cell 503-358-6832 Phone 503-621-5143 Fax 503-621-5199 MSN: sbruno@miralink.com Google: seanwbruno@gmail.com From erich at fuujinnetworks.com Sat Aug 30 07:46:32 2008 From: erich at fuujinnetworks.com (Fuujin Networks LLC) Date: Sat Aug 30 07:46:39 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <48B8E265.4060001@miralink.com> References: <48B4CF57.30603@fuujinnetworks.com> <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> <48B733CF.5000105@fuujinnetworks.com> <3c0b01820808290914s638c970ejeae1d4f8c8c8a9d9@mail.gmail.com> <3c0b01820808290915t4e964182y784c215e28977252@mail.gmail.com> <48B8E879.7020809@fuujinnetworks.com> <48B8E265.4060001@miralink.com> Message-ID: <48B908B3.6050207@fuujinnetworks.com> Sean: I've tried it with the loader.conf entry and without, but I've not seen any difference in the way it loads the firmware. It is compiled into the Kernel though, so the loader.conf entry should be unnecessary. Thanks for the thought though! I'm willing to look at any possibility to make this work! :) Erich M. Jenkins Fuujin Networks, LLC PO Box 792 Brainerd, MN 56401 (p) 218-824-5038 (f) 218-824-7516 "You should never, never doubt what no one is sure about." -- Gene Wilder Sean Bruno wrote: > Fuujin Networks LLC wrote: >> Alex: >> >> Thanks very much for the patch. Unfortunately, I ended up with a >> similar result as seen below. Just for grins, I tried the patch on a >> 64-bit system (AMD64) to see if there was a difference based on which >> architecture is used for the target. No difference there either; still >> dumps core and reboots. The upside I would think is that both branches >> seem to be in sync. I do have a sparc64 box here if you'd like to see >> what happens in that world (haven't tested it yet). >> >> [snip] >> (targ0:isp0:0:2:0): targdone 0xffffff0001ddda00 >> (targ0:isp0:0:2:0): targread >> (targ0:isp0:0:2:0): targread ccb 0xffffff0001ddda00 (0x800b7fe20) >> (targ0:isp0:0:2:0): targreturnccb 0xffffff0001ddda00 >> cam_debug: targfreeccb descr 0xffffff0001dda1c0 and >> cam_debug: freeing ccb 0xffffff0001ddda00 >> (targ0:isp0:0:2:0): write - uio_resid 8 >> (targ0:isp0:0:2:0): Sending queued ccb 0x933 (0x800b85040) >> (targ0:isp0:0:2:0): targstart 0xffffff0001369000 >> (targ0:isp0:0:2:0): sendccb 0xffffff0001369000 >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id = 00 >> fault virtual address = 0x8 >> fault code = supervisor read data, page not present >> instruction pointer = 0x8:0xffffffff8025d2e8 >> stack pointer = 0x10:0xffffffffae3d06f0 >> frame pointer = 0x10:0xffffffff80a42000 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 783 (scsi_target) >> trap number = 12 >> panic: page fault >> cpuid = 0 >> Uptime: 7m21s >> Physical memory: 4021 MB >> Dumping 364 MB:: write - uio_resid 8 >> (targ0:isp0:0:2:0): getccb 0xffffff0001db7c00 >> (targ0:isp0:0:2:0): Sent ATIO/INOT (0x800b61a10) >> (targ0:isp0:0:2:0): write - uio_resid 8 >> (targ0:isp0:0:2:0): getccb 0xffffff0001db7b00 >> [snip] >> >> >> Seems to be nearly the same result in loading the firmware: >> >> [snip] >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> [snip] >> isp0: port 0x3000-0x30ff mem >> 0xfe020000-0xfe020fff irq 25 at device 1.0 on pci2 >> firmware_get: failed to load firmware image isp_2300_it >> isp0: [ITHREAD] >> isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 >> [snip] >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1006 >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1008 >> (targbh0:isp0:0:-1:-1): Target Mode Enabled >> [snip] >> >> It doesn't appear that the firmware "isp_2300_it" either exists or >> possibly isn't named properly on the target machine (or the initiator >> for that matter). However, there do not seem to be any problems >> loading the firmware for the card when it's not in target mode. >> >> From everything I've read, it looks like the firmware needs to be >> loaded via the kernel device option "ispfw". If for nothing other than >> my understanding, is there some reason we're not loading the firmware >> resident on the card? >> >> Thanks very much for your help. This is a bit of a head scratcher for >> me... >> >> Please let me know if SSH access to either boxes would help you and >> I'll be happy to arrange that. >> >> >> Erich M. Jenkins >> Fuujin Networks, LLC >> PO Box 792 >> Brainerd, MN 56401 >> (p) 218-824-5038 >> (f) 218-824-7516 >> >> "You should never, never doubt what no one is sure about." >> -- Gene Wilder >> >> Alexander Sack wrote: >>> On Fri, Aug 29, 2008 at 12:14 PM, Alexander Sack >>> wrote: >>>> On Thu, Aug 28, 2008 at 7:25 PM, Fuujin Networks LLC >>>> wrote: >>>>> [snip] >>>>> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs >>>>> cpu0 (BSP): APIC ID: 0 >>>>> cpu1 (AP): APIC ID: 1 >>>>> ioapic0: Changing APIC ID to 2 >>>>> ioapic0 irqs 0-23 on motherboard >>>>> registered firmware set >>>>> registered firmware set >>>>> registered firmware set >>>>> registered firmware set >>>>> registered firmware set >>>>> registered firmware set >>>>> registered firmware set >>>>> registered firmware set >>>>> registered firmware set >>>>> registered firmware set >>>>> registered firmware set >>>>> isp0: port 0xc000-0xc0ff mem >>>>> 0xe7103000-0xe7103fff irq 16 at device 8.0 on pci0 >>>>> firmware_get: failed to load firmware image isp_2300_it >>>>> isp0: [ITHREAD] >>>>> isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 >>>>> isp0: target notify code 0x1007 >>>>> isp0: target notify code 0x1007 >>>>> isp0: target notify code 0x1006 >>>>> isp0: target notify code 0x1007 >>>>> isp0: target notify code 0x1008 >>>>> (targbh0:isp0:0:-1:-1): Target Mode Enabled >>>>> isp0: target notify code 0x1007 >>>>> isp0: target notify code 0x1007 >>>>> isp0: target notify code 0x1006 >>>>> isp0: target notify code 0x1007 >>>>> isp0: target notify code 0x1006 >>>>> isp0: target notify code 0x1007 >>>>> [snip] >>>>> >>>>> I'm a bit puzzled by the firmware_get failed line above. I suspect >>>>> this may >>>>> be the problem, but I have not been able to resolve it. I've tried >>>>> disabling >>>>> the bios on the FC cards, as well as messing with almost every other >>>>> conceivable option, but the same error appears. Thoughts? >>>> Yes, its a bug in the ISP driver. If you are in target mode, it tries >>>> to load the isp_XXX_it version of the RISC code. I *think* the old >>>> SCSI cards had two separate firmwares for target and initiator modes >>>> (currently if you look at ispfw, there is the 1040, 1080, and 12160_it >>>> firmwares). >>>> >>>> Try this patch: >>>> >>>> --- isp_pci.c 2008-08-29 07:58:08.000000000 -0400 >>>> +++ isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 >>>> @@ -1039,7 +1039,7 @@ >>>> } >>>> >>>> isp->isp_osinfo.fw = NULL; >>>> - if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { >>>> + if (isp->isp_role & ISP_ROLE_TARGET) { >>>> snprintf(fwname, sizeof (fwname), >>>> "isp_%04x_it", did); >>>> isp->isp_osinfo.fw = firmware_get(fwname); >>>> } >>> >>> Whoops! Its reversed! >>> >>> --- isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 >>> +++ isp_pci.c 2008-08-29 07:58:08.000000000 -0400 >>> @@ -1039,7 +1039,7 @@ >>> } >>> >>> isp->isp_osinfo.fw = NULL; >>> - if (isp->isp_role & ISP_ROLE_TARGET) { >>> + if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { >>> snprintf(fwname, sizeof (fwname), "isp_%04x_it", did); >>> isp->isp_osinfo.fw = firmware_get(fwname); >>> } >>> >>> Sorry about that! >>> >>> -aps >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > Hrm...just to check, do you have the following in your loader.conf: > ispfw_load="YES" > > ? > From pisymbol at gmail.com Sat Aug 30 14:08:53 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Sat Aug 30 14:09:02 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <48B8E879.7020809@fuujinnetworks.com> References: <48B4CF57.30603@fuujinnetworks.com> <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> <48B733CF.5000105@fuujinnetworks.com> <3c0b01820808290914s638c970ejeae1d4f8c8c8a9d9@mail.gmail.com> <3c0b01820808290915t4e964182y784c215e28977252@mail.gmail.com> <48B8E879.7020809@fuujinnetworks.com> Message-ID: <3c0b01820808300708s5ed5cb18o5199e0e4ec1dcbba@mail.gmail.com> On Sat, Aug 30, 2008 at 2:28 AM, Fuujin Networks LLC wrote: > Alex: > > Thanks very much for the patch. Unfortunately, I ended up with a similar > result as seen below. Just for grins, I tried the patch on a 64-bit system > (AMD64) to see if there was a difference based on which architecture is used > for the target. No difference there either; still dumps core and reboots. > The upside I would think is that both branches seem to be in sync. I do have > a sparc64 box here if you'd like to see what happens in that world (haven't > tested it yet). No, no, no....the patch was not to FIX the target mode issue. There is only one firmware and it does get loaded (even with the error message below). I was *JUST* trying to avoid firmware_get() to attempt to register a firmware that does not exist. I reversed the patch, are you sure you applied the right one? Take a look but isp_pci.c should have an added IS_SCSI(isp) to line 1039 here it is again: --- isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 +++ isp_pci.c 2008-08-29 07:58:08.000000000 -0400 @@ -1039,7 +1039,7 @@ } isp->isp_osinfo.fw = NULL; - if (isp->isp_role & ISP_ROLE_TARGET) { + if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { snprintf(fwname, sizeof (fwname), "isp_%04x_it", did); isp->isp_osinfo.fw = firmware_get(fwname); } This should eliminate the firmware_get() message. But AGAIN this will not fix your target mode issues. > [snip] > (targ0:isp0:0:2:0): targdone 0xffffff0001ddda00 > (targ0:isp0:0:2:0): targread > (targ0:isp0:0:2:0): targread ccb 0xffffff0001ddda00 (0x800b7fe20) > (targ0:isp0:0:2:0): targreturnccb 0xffffff0001ddda00 > cam_debug: targfreeccb descr 0xffffff0001dda1c0 and > cam_debug: freeing ccb 0xffffff0001ddda00 > (targ0:isp0:0:2:0): write - uio_resid 8 > (targ0:isp0:0:2:0): Sending queued ccb 0x933 (0x800b85040) > (targ0:isp0:0:2:0): targstart 0xffffff0001369000 > (targ0:isp0:0:2:0): sendccb 0xffffff0001369000 > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x8 > fault code = supervisor read data, page not present > instruction pointer = 0x8:0xffffffff8025d2e8 > stack pointer = 0x10:0xffffffffae3d06f0 > frame pointer = 0x10:0xffffffff80a42000 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 783 (scsi_target) > trap number = 12 > panic: page fault > cpuid = 0 > Uptime: 7m21s > Physical memory: 4021 MB > Dumping 364 MB:: write - uio_resid 8 > (targ0:isp0:0:2:0): getccb 0xffffff0001db7c00 > (targ0:isp0:0:2:0): Sent ATIO/INOT (0x800b61a10) > (targ0:isp0:0:2:0): write - uio_resid 8 > (targ0:isp0:0:2:0): getccb 0xffffff0001db7b00 > [snip] > > > Seems to be nearly the same result in loading the firmware: > > [snip] > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > [snip] > isp0: port 0x3000-0x30ff mem > 0xfe020000-0xfe020fff irq 25 at device 1.0 on pci2 > firmware_get: failed to load firmware image isp_2300_it > isp0: [ITHREAD] Hold on, do you see this message with the patch? Are you sure you rebuild and rebooted correctly? You should not see this message anymore. I will verify the patch again but see above. > It doesn't appear that the firmware "isp_2300_it" either exists or possibly > isn't named properly on the target machine (or the initiator for that > matter). However, there do not seem to be any problems loading the firmware > for the card when it's not in target mode. No there is no problem. The error message above is harmless. > From everything I've read, it looks like the firmware needs to be loaded via > the kernel device option "ispfw". If for nothing other than my > understanding, is there some reason we're not loading the firmware resident > on the card? You ARE loading the firmware (isp_2300 one from ispfw) it just that the code also tries to get an IT version of it and for fibre channel cards there is no such thing. The patch I gave should remove this nuisance BUT NOT FIX target mode. You need to rebuild the kernel with: options KDB options DDB and when you panic, do a "bt" and copy the stack trace so we know WHERE exactly its panicing on your 23xxx setup. Can you post your kernel configuration file as well? Thanks! -aps From erich at fuujinnetworks.com Sun Aug 31 11:00:58 2008 From: erich at fuujinnetworks.com (Fuujin Networks LLC) Date: Sun Aug 31 11:01:06 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <3c0b01820808300708s5ed5cb18o5199e0e4ec1dcbba@mail.gmail.com> References: <48B4CF57.30603@fuujinnetworks.com> <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> <48B733CF.5000105@fuujinnetworks.com> <3c0b01820808290914s638c970ejeae1d4f8c8c8a9d9@mail.gmail.com> <3c0b01820808290915t4e964182y784c215e28977252@mail.gmail.com> <48B8E879.7020809@fuujinnetworks.com> <3c0b01820808300708s5ed5cb18o5199e0e4ec1dcbba@mail.gmail.com> Message-ID: <48BA87C6.5070008@fuujinnetworks.com> Alex: I apologize for not being more specific in my questions. I understand that we're loading the firmware via the kernel, but my question was why not load it from the card? If I have an HP SmartArray 5300 card and the firmware is out of date, I'm expected to update it, not load a kernel module to do it for me. This makes sense for many reasons, not he least of which is compatibility. I'm in no position to suggest what is proper from the standpoint of this particular problem, but I'm trying to understand the reason for choosing a kernel module rather than an sys admin as with nearly all other devices. I misunderstood the purpose of your patch as well. I thought the problem was a firmware loading issue, but as you mentioned, this does not appear to be the case. I did see your message with the patch and it was correctly applied and the kernel was correctly compiled. I did, however, reinstall the OS because of all the fiddling I did to this point. Funny thing is that I can't get it to crash anymore. I tried it clean, and the system tanked, but after I applied your patch, I can't get it to panic anymore. The loop looks like it comes up, but when I rescan with the initiator, the target stays up without incident, but nothing shows up in camcontrol as an emulated disk: amd_svr0-01# camcontrol devlist -v scbus0 on isp0 bus 0: < > at scbus0 target -1 lun -1 () scbus-1 on xpt0 bus 0: < > at scbus-1 target -1 lun -1 (xpt0) I do get this on the initiator though: [snip] Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, status not marked) Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, status not marked) Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, status not marked) Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, status not marked) Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, status not marked) Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, status not marked) Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, status not marked) Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, status not marked) Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, status not marked) Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.7 (count 36, resid 36, status not marked) [snip] After a clean install, this is what I see from dmesg on the target: [snip] registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set isp0: port 0x3000-0x30ff mem 0xfe020000-0xfe020fff irq 25 at device 1.0 on pci2 isp0: [ITHREAD] isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 isp0: target notify code 0x1007 isp0: target notify code 0x1007 isp0: target notify code 0x1007 isp0: target notify code 0x1008 isp0: target notify code 0x1006 [snip] Here's the complete kernel, also after a fresh install and the removal of unnecessary options/devices (stuff not in the server): Kernel cpu HAMMER ident GENERIC # To statically compile in device wiring instead of /boot/device.hints #hints "GENERIC.hints" # Default places to look for devices. makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options SCHED_4BSD # 4BSD scheduler options PREEMPTION # Enable kernel thread preemption options INET # InterNETworking options INET6 # IPv6 communications protocols options SCTP # Stream Control Transmission Protocol options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL # Enable gjournal-based UFS journaling options MD_ROOT # MD is a potential root device options NFSCLIENT # Network Filesystem Client options NFSSERVER # Network Filesystem Server options NFS_ROOT # NFS usable as /, requires NFSCLIENT options NTFS # NT File System options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options GEOM_LABEL # Provides labelization options COMPAT_43TTY # BSD 4.3 TTY compat [KEEP THIS!] options COMPAT_IA32 # Compatible with i386 binaries options COMPAT_FREEBSD4 # Compatible with FreeBSD4 options COMPAT_FREEBSD5 # Compatible with FreeBSD5 options COMPAT_FREEBSD6 # Compatible with FreeBSD6 options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options ADAPTIVE_GIANT # Giant mutex is adaptive. options STOP_NMI # Stop CPUS using NMI instead of IPI options AUDIT # Security event auditing # Make an SMP-capable kernel by default options SMP # Symmetric MultiProcessor Kernel # kernel debugging framework options KDB options DDB # CPU frequency control device cpufreq # Bus support. device acpi device pci # ATA and ATAPI devices device ata device atadisk # ATA disk drives device ataraid # ATA RAID drives device atapicd # ATAPI CDROM drives options ATA_STATIC_ID # Static device numbering # SCSI Controllers device isp # Qlogic family device ispfw # Firmware for QLogic HBAs- normally a module options ISP_TARGET_MODE # for ISP cards to operate in target mode device targ # SCSI Target device device targbh # SCSI Target Black Hole options CAMDEBUG options VFS_AIO # SCSI peripherals device scbus # SCSI bus (required for SCSI) device da # Direct Access (disks) device sa # Sequential Access (tape etc) device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver device splash # Splash screen and screen saver support # syscons is the default console driver, resembling an SCO console device sc device agp # support several AGP chipsets # Serial (COM) ports device sio # 8250, 16[45]50 based serial ports device uart # Generic UART driver device em # Intel PRO/1000 adapter Gigabit Ethernet Card # PCI Ethernet NICs that use the common MII bus controller code. # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs! device miibus # MII bus support device bce # Broadcom BCM5706/BCM5708 Gigabit Ethernet device bfe # Broadcom BCM440x 10/100 Ethernet device bge # Broadcom BCM570xx Gigabit Ethernet device fxp # Intel EtherExpress PRO/100B (82557, 82558) device re # RealTek 8139C+/8169/8169S/8110S device rl # RealTek 8129/8139 device sis # Silicon Integrated Systems SiS 900/SiS 7016 # Pseudo devices. device loop # Network loopback device random # Entropy device device ether # Ethernet support device sl # Kernel SLIP device ppp # Kernel PPP device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device gif # IPv6 and IPv4 tunneling device faith # IPv6-to-IPv4 relaying (translation) device firmware # firmware assist module # The `bpf' device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! # Note that 'bpf' is required for DHCP. device bpf # Berkeley packet filter Not sure what to make of this.... Would you recommend a different FC card? Emulex? Erich M. Jenkins Fuujin Networks, LLC PO Box 792 Brainerd, MN 56401 (p) 218-824-5038 (f) 218-824-7516 "You should never, never doubt what no one is sure about." -- Gene Wilder Alexander Sack wrote: > On Sat, Aug 30, 2008 at 2:28 AM, Fuujin Networks LLC > wrote: >> Alex: >> >> Thanks very much for the patch. Unfortunately, I ended up with a similar >> result as seen below. Just for grins, I tried the patch on a 64-bit system >> (AMD64) to see if there was a difference based on which architecture is used >> for the target. No difference there either; still dumps core and reboots. >> The upside I would think is that both branches seem to be in sync. I do have >> a sparc64 box here if you'd like to see what happens in that world (haven't >> tested it yet). > > No, no, no....the patch was not to FIX the target mode issue. There > is only one firmware and it does get loaded (even with the error > message below). I was *JUST* trying to avoid firmware_get() to > attempt to register a firmware that does not exist. I reversed the > patch, are you sure you applied the right one? Take a look but > isp_pci.c should have an added IS_SCSI(isp) to line 1039 here it is > again: > > --- isp_pci.c.0 2008-08-29 08:03:24.000000000 -0400 > +++ isp_pci.c 2008-08-29 07:58:08.000000000 -0400 > @@ -1039,7 +1039,7 @@ > } > > isp->isp_osinfo.fw = NULL; > - if (isp->isp_role & ISP_ROLE_TARGET) { > + if (isp->isp_role & ISP_ROLE_TARGET && IS_SCSI(isp)) { > snprintf(fwname, sizeof (fwname), "isp_%04x_it", did); > isp->isp_osinfo.fw = firmware_get(fwname); > } > > This should eliminate the firmware_get() message. But AGAIN this will > not fix your target mode issues. > >> [snip] >> (targ0:isp0:0:2:0): targdone 0xffffff0001ddda00 >> (targ0:isp0:0:2:0): targread >> (targ0:isp0:0:2:0): targread ccb 0xffffff0001ddda00 (0x800b7fe20) >> (targ0:isp0:0:2:0): targreturnccb 0xffffff0001ddda00 >> cam_debug: targfreeccb descr 0xffffff0001dda1c0 and >> cam_debug: freeing ccb 0xffffff0001ddda00 >> (targ0:isp0:0:2:0): write - uio_resid 8 >> (targ0:isp0:0:2:0): Sending queued ccb 0x933 (0x800b85040) >> (targ0:isp0:0:2:0): targstart 0xffffff0001369000 >> (targ0:isp0:0:2:0): sendccb 0xffffff0001369000 >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id = 00 >> fault virtual address = 0x8 >> fault code = supervisor read data, page not present >> instruction pointer = 0x8:0xffffffff8025d2e8 >> stack pointer = 0x10:0xffffffffae3d06f0 >> frame pointer = 0x10:0xffffffff80a42000 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 783 (scsi_target) >> trap number = 12 >> panic: page fault >> cpuid = 0 >> Uptime: 7m21s >> Physical memory: 4021 MB >> Dumping 364 MB:: write - uio_resid 8 >> (targ0:isp0:0:2:0): getccb 0xffffff0001db7c00 >> (targ0:isp0:0:2:0): Sent ATIO/INOT (0x800b61a10) >> (targ0:isp0:0:2:0): write - uio_resid 8 >> (targ0:isp0:0:2:0): getccb 0xffffff0001db7b00 >> [snip] >> >> >> Seems to be nearly the same result in loading the firmware: >> >> [snip] >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> [snip] >> isp0: port 0x3000-0x30ff mem >> 0xfe020000-0xfe020fff irq 25 at device 1.0 on pci2 >> firmware_get: failed to load firmware image isp_2300_it >> isp0: [ITHREAD] > > Hold on, do you see this message with the patch? Are you sure you > rebuild and rebooted correctly? You should not see this message > anymore. I will verify the patch again but see above. > >> It doesn't appear that the firmware "isp_2300_it" either exists or possibly >> isn't named properly on the target machine (or the initiator for that >> matter). However, there do not seem to be any problems loading the firmware >> for the card when it's not in target mode. > > No there is no problem. The error message above is harmless. > >> From everything I've read, it looks like the firmware needs to be loaded via >> the kernel device option "ispfw". If for nothing other than my >> understanding, is there some reason we're not loading the firmware resident >> on the card? > > You ARE loading the firmware (isp_2300 one from ispfw) it just that > the code also tries to get an IT version of it and for fibre channel > cards there is no such thing. The patch I gave should remove this > nuisance BUT NOT FIX target mode. > > You need to rebuild the kernel with: > > options KDB > options DDB > > and when you panic, do a "bt" and copy the stack trace so we know > WHERE exactly its panicing on your 23xxx setup. Can you post your > kernel configuration file as well? > > Thanks! > > -aps From pisymbol at gmail.com Sun Aug 31 17:12:33 2008 From: pisymbol at gmail.com (Alexander Sack) Date: Sun Aug 31 17:12:40 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <48BA87C6.5070008@fuujinnetworks.com> References: <48B4CF57.30603@fuujinnetworks.com> <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> <48B733CF.5000105@fuujinnetworks.com> <3c0b01820808290914s638c970ejeae1d4f8c8c8a9d9@mail.gmail.com> <3c0b01820808290915t4e964182y784c215e28977252@mail.gmail.com> <48B8E879.7020809@fuujinnetworks.com> <3c0b01820808300708s5ed5cb18o5199e0e4ec1dcbba@mail.gmail.com> <48BA87C6.5070008@fuujinnetworks.com> Message-ID: <3c0b01820808311012n7e83a948t732e6544ddb0d703@mail.gmail.com> On Sun, Aug 31, 2008 at 8:00 AM, Fuujin Networks LLC wrote: > I apologize for not being more specific in my questions. I understand that > we're loading the firmware via the kernel, but my question was why not load > it from the card? If I have an HP SmartArray 5300 card and the firmware is > out of date, I'm expected to update it, not load a kernel module to do it > for me. This makes sense for many reasons, not he least of which is > compatibility. I'm in no position to suggest what is proper from the > standpoint of this particular problem, but I'm trying to understand the > reason for choosing a kernel module rather than an sys admin as with nearly > all other devices. We do both! QLogic ships each card with some version of the firmware on it that boots up at runtime. One of the nice features of the ISP is that its RISC based firmware can be updated at runtime ensuring you are always running the latest. The ispfw driver is strictly used to register firmwares with the generic firmware driver (the real action happens in isp during isp_reset()). I think the driver should really check to see if the ispfw version is less than the resident driver and do the right thing. I think it used to do that but was taken out, I don't know why - I'm actually thinking of maybe it should be added back. In any event, if you want to disable loading of the firmware you can set in your hints file: hint.isp.0.fwload_disable=1 That should prevent the driver from loading the ispfw version (please check during bootup what version your resident firmware is at to determine which is newer). If you do this then you should see: isp0: Board Type 2300, Chip Revision 0x1, resident F/W Revision instead of isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision Having a separate utility (typically DOS or Windows based) is not that great in my eyes but to each his own. Bottom line is you should run the latest ISP firmware (whether its the one that was flashed from QLogic or the one in the ispfw driver). I'm thinking that perhaps and audit should be done and we should ship the latest firmware off the QLogic website. What version is shipped with your card? Looks like 3.3.25 is the latest for 23xx cards. Hmmm.... > I misunderstood the purpose of your patch as well. I thought the problem > was a firmware loading issue, but as you mentioned, this does not appear to > be the case. Right, it seems something else. > I did see your message with the patch and it was correctly applied and the > kernel was correctly compiled. I did, however, reinstall the OS because of > all the fiddling I did to this point. Funny thing is that I can't get it to > crash anymore. I tried it clean, and the system tanked, but after I applied > your patch, I can't get it to panic anymore. The loop looks like it comes > up, but when I rescan with the initiator, the target stays up without > incident, but nothing shows up in camcontrol as an emulated disk: > > amd_svr0-01# camcontrol devlist -v > scbus0 on isp0 bus 0: > < > at scbus0 target -1 lun -1 () > scbus-1 on xpt0 bus 0: > < > at scbus-1 target -1 lun -1 (xpt0) > > I do get this on the initiator though: > > [snip] > Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, > status not marked) > Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, > status not marked) > Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, > status not marked) > Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, > status not marked) > Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, > status not marked) > Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, > status not marked) > Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, > status not marked) > Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, > status not marked) > Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, > status not marked) > Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.7 (count 36, resid 36, > status not marked) > [snip] > > After a clean install, this is what I see from dmesg on the target: > > [snip] > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > registered firmware set > isp0: port 0x3000-0x30ff mem > 0xfe020000-0xfe020fff irq 25 at device 1.0 on pci2 > isp0: [ITHREAD] > isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 > isp0: target notify code 0x1007 > isp0: target notify code 0x1007 > isp0: target notify code 0x1007 > isp0: target notify code 0x1008 > isp0: target notify code 0x1006 > [snip] Is this with or without the isp patch I sent regarding the firmware? I noticed its not trying to get isp_2300_it like before (I'm hoping that's due to the patch I sent otherwise I'm confused this holiday weekend). > Here's the complete kernel, also after a fresh install and the removal of > unnecessary options/devices (stuff not in the server): > # SCSI Controllers > device isp # Qlogic family > device ispfw # Firmware for QLogic HBAs- normally a > module > options ISP_TARGET_MODE # for ISP cards to operate in target mode > device targ # SCSI Target device > device targbh # SCSI Target Black Hole > options CAMDEBUG > options VFS_AIO Thanks for this, I just wanted to verify your build options look good. > Not sure what to make of this.... Would you recommend a different FC card? > Emulex? I have no direct experience with Emulex with FreeBSD so I'm not the right person to ask. I was under the impression that the 23xx target mode was working. Did you enable target mode in the BIOS by chance (or disable it, I think on my 24xx BIOS I have that option but I'm not in front of it yet). Just verify your BIOS version number and options before completely giving up! :D -aps From erich at fuujinnetworks.com Sun Aug 31 22:22:58 2008 From: erich at fuujinnetworks.com (Fuujin Networks LLC) Date: Sun Aug 31 22:23:05 2008 Subject: Qlogic FC scsi_target ISP2310 In-Reply-To: <3c0b01820808311012n7e83a948t732e6544ddb0d703@mail.gmail.com> References: <48B4CF57.30603@fuujinnetworks.com> <3c0b01820808271520w78d0f338iaf6996774512b5bb@mail.gmail.com> <48B733CF.5000105@fuujinnetworks.com> <3c0b01820808290914s638c970ejeae1d4f8c8c8a9d9@mail.gmail.com> <3c0b01820808290915t4e964182y784c215e28977252@mail.gmail.com> <48B8E879.7020809@fuujinnetworks.com> <3c0b01820808300708s5ed5cb18o5199e0e4ec1dcbba@mail.gmail.com> <48BA87C6.5070008@fuujinnetworks.com> <3c0b01820808311012n7e83a948t732e6544ddb0d703@mail.gmail.com> Message-ID: <48BB279E.7080402@fuujinnetworks.com> Alex: Thanks very much for your input and assistance! I'm by no means an expert with FC on FreeBSD, but since it's my OS of choice I'm trying to fight through this one. The fresh install is using your patch, so I believe everything is working correctly there. :) I've noticed a few things about the isp driver that make me a bit nervous though. When I booted after a fresh install, I found that the system (initiator) hangs if the target is up and on the loop. Then the system comes up after I physically unplug the link. Plugging it back in doesn't produce any error messages on either end, but the target won't panic, and the initiator spits out the error I included in the previous email. I'm starting the scsi target with "./scsi_target -d 0:2:0 backing.file" and rescanning the initiator with "camcontrol rescan all". This generally caused a core dump/kernel panic, but doesn't seem to be doing this now (for no reason evident to me anyway)... Is there something particular about these cards and the SCSI bus:id:lun that I'm missing? Perhaps in the cards configuration settings? It appears from the man page for ISP that the driver ignores settings on the card. Is this actually the case?? BTW: I appreciate your assistance during a holiday weekend of all things! Proves once again why open source is the way to go!! Erich M. Jenkins Fuujin Networks, LLC PO Box 792 Brainerd, MN 56401 (p) 218-824-5038 (f) 218-824-7516 "You should never, never doubt what no one is sure about." -- Gene Wilder Alexander Sack wrote: > On Sun, Aug 31, 2008 at 8:00 AM, Fuujin Networks LLC > wrote: >> I apologize for not being more specific in my questions. I understand that >> we're loading the firmware via the kernel, but my question was why not load >> it from the card? If I have an HP SmartArray 5300 card and the firmware is >> out of date, I'm expected to update it, not load a kernel module to do it >> for me. This makes sense for many reasons, not he least of which is >> compatibility. I'm in no position to suggest what is proper from the >> standpoint of this particular problem, but I'm trying to understand the >> reason for choosing a kernel module rather than an sys admin as with nearly >> all other devices. > > We do both! QLogic ships each card with some version of the firmware > on it that boots up at runtime. One of the nice features of the ISP > is that its RISC based firmware can be updated at runtime ensuring you > are always running the latest. The ispfw driver is strictly used to > register firmwares with the generic firmware driver (the real action > happens in isp during isp_reset()). > > I think the driver should really check to see if the ispfw version is > less than the resident driver and do the right thing. I think it used > to do that but was taken out, I don't know why - I'm actually thinking > of maybe it should be added back. > > In any event, if you want to disable loading of the firmware you can > set in your hints file: > > hint.isp.0.fwload_disable=1 > > That should prevent the driver from loading the ispfw version (please > check during bootup what version your resident firmware is at to > determine which is newer). If you do this then you should see: > > isp0: Board Type 2300, Chip Revision 0x1, resident F/W Revision > > instead of > > isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision > > Having a separate utility (typically DOS or Windows based) is not that > great in my eyes but to each his own. Bottom line is you should run > the latest ISP firmware (whether its the one that was flashed from > QLogic or the one in the ispfw driver). I'm thinking that perhaps and > audit should be done and we should ship the latest firmware off the > QLogic website. What version is shipped with your card? Looks like > 3.3.25 is the latest for 23xx cards. Hmmm.... > >> I misunderstood the purpose of your patch as well. I thought the problem >> was a firmware loading issue, but as you mentioned, this does not appear to >> be the case. > > Right, it seems something else. > >> I did see your message with the patch and it was correctly applied and the >> kernel was correctly compiled. I did, however, reinstall the OS because of >> all the fiddling I did to this point. Funny thing is that I can't get it to >> crash anymore. I tried it clean, and the system tanked, but after I applied >> your patch, I can't get it to panic anymore. The loop looks like it comes >> up, but when I rescan with the initiator, the target stays up without >> incident, but nothing shows up in camcontrol as an emulated disk: >> >> amd_svr0-01# camcontrol devlist -v >> scbus0 on isp0 bus 0: >> < > at scbus0 target -1 lun -1 () >> scbus-1 on xpt0 bus 0: >> < > at scbus-1 target -1 lun -1 (xpt0) >> >> I do get this on the initiator though: >> >> [snip] >> Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, >> status not marked) >> Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, >> status not marked) >> Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, >> status not marked) >> Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, >> status not marked) >> Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, >> status not marked) >> Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, >> status not marked) >> Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.5 (count 36, resid 36, >> status not marked) >> Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, >> status not marked) >> Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.6 (count 36, resid 36, >> status not marked) >> Aug 31 05:44:34 test kernel: isp0: bad underrun for 0.7 (count 36, resid 36, >> status not marked) >> [snip] >> >> After a clean install, this is what I see from dmesg on the target: >> >> [snip] >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> registered firmware set >> isp0: port 0x3000-0x30ff mem >> 0xfe020000-0xfe020fff irq 25 at device 1.0 on pci2 >> isp0: [ITHREAD] >> isp0: Board Type 2300, Chip Revision 0x1, loaded F/W Revision 3.3.19 >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1007 >> isp0: target notify code 0x1008 >> isp0: target notify code 0x1006 >> [snip] > > Is this with or without the isp patch I sent regarding the firmware? > I noticed its not trying to get isp_2300_it like before (I'm hoping > that's due to the patch I sent otherwise I'm confused this holiday > weekend). > >> Here's the complete kernel, also after a fresh install and the removal of >> unnecessary options/devices (stuff not in the server): > >> # SCSI Controllers >> device isp # Qlogic family >> device ispfw # Firmware for QLogic HBAs- normally a >> module >> options ISP_TARGET_MODE # for ISP cards to operate in target mode >> device targ # SCSI Target device >> device targbh # SCSI Target Black Hole >> options CAMDEBUG >> options VFS_AIO > > Thanks for this, I just wanted to verify your build options look good. > >> Not sure what to make of this.... Would you recommend a different FC card? >> Emulex? > > I have no direct experience with Emulex with FreeBSD so I'm not the > right person to ask. I was under the impression that the 23xx target > mode was working. Did you enable target mode in the BIOS by chance > (or disable it, I think on my 24xx BIOS I have that option but I'm not > in front of it yet). Just verify your BIOS version number and options > before completely giving up! :D > > -aps