isp and scsi_target

Erich Jenkins, Fuujin Group Ltd erich at fuujingroup.com
Wed Apr 21 01:57:21 UTC 2010


We're trying to get an emulated disk to show up on 7.3-REL and not 
having much luck. This is a point-to-point connection with a pair of 
Qlogic cards (pciconf below). There is no FC switch in between the 
machines, and both cards were defaulted prior to testing (factory BIOS 
settings). The moment I rescan the bus on the initiator, the target 
machine panics and dumps core. The initiator hangs until the FC card on 
the initiator resets, then returns to the prompt (wedge??).

Here's the card (same in both machines though different scsi bus)

isp0 at pci0:5:1:0:        class=0x0c0400 card=0x00091077 chip=0x23001077 
rev=0x01 hdr=0x00
     vendor     = 'QLogic Corporation'
     device     = 'QLA2300 SANblade 2300 64-bit FC-AL Adapter'
     class      = serial bus
     subclass   = Fibre Channel


I get tons of debugging output on the target machine when launching 
scsi_target with the following command:

test001# scsi_target -d 3:0:0 /usr/home/testuser/target0

Here's a snip-it of the debugging output on the target machine after the 
above command (goes on for pages):

scsi_target: sending ccb (0x332)
scsi_target: sending ccb (0x334)
scsi_target: sending ccb (0x332)
scsi_target: sending ccb (0x334)
scsi_target: main loop beginning

Then this when the initiator rescans the bus just before it tanks:

scsi_target: read ready
scsi_target: event -1 done
scsi_target: Working on ATIO 0x2825c200
scsi_target: tcmd_handle atio 0x2825c200 ctio 0x2825e0c0 atioflags 0x8000

And this in the log on the initiator when it comes back up:

isp0: bad pdb (110) @ handle 0x1
isp0: 0: hdl 0x1 PROB al1 tgt   0  TGT 0x0000e8 => UNK 0x000000; WWNN 
0x200000e08b08f56d WWPN 0x210000e08b08f56d


Here's the relevant kernel info on the target:

# ISP SCSI Controllers
device          isp             # Qlogic family
device          ispfw           # Firmware for QLogic HBAs
options         ISP_TARGET_MODE # Qlogic family target mode
device          targ
device          targbh
options         CAMDEBUG
options         VFS_AIO

/boot/device.hints on the target:

hint.isp.0.fullduplex="1"
hint.isp.0.topology="nport-only"
hint.isp.0.role="target"

Here's the relevant kernel info on the initiator:

# ISP SCSI Controllers
device          isp             # Qlogic family
device          ispfw           # Firmware for QLogic HBAs
device          targ
device          targbh
options         CAMDEBUG
options         VFS_AIO

/boot/device.hints on the initiator:

hint.isp.0.fullduplex="1"
hint.isp.0.topology="nport-only"
hint.isp.0.role="initiator"
hint.isp.0.iid="4"


I'm seeing this in the syslog on the initiator:

Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.5 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.5 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.5 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
resid 36, status not marked)
Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, 
resid 36, status not marked)


Here's the bt for the core dump after the panic which looks to be pretty 
useless from my observation (I'd _love_ to be wrong!!):

test001# kgdb kernel.debug /var/crash/vmcore.0

Unread portion of the kernel message buffer:
(targ0:isp0:0:0:0): targdone 0xc7b7b400
(targ0:isp0:0:0:0): targread
(targ0:isp0:0:0:0): targread ccb 0xc7b7b400 (0x2825c200)
(targ0:isp0:0:0:0): targreturnccb 0xc7b7b400
cam_debug: targfreeccb descr 0xc7b80060 and
cam_debug: freeing ccb 0xc7b7b400
(targ0:isp0:0:0:0): write - uio_resid 4
(targ0:isp0:0:0:0): Sending queued ccb 0x933 (0x2825e0c0)
(targ0:isp0:0:0:0): targstart 0xc73bd400
(targ0:isp0:0:0:0): sendccb 0xc73bd400


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address   = 0x4
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc04f0a66
stack pointer           = 0x28:0xc6fe5900
frame pointer           = 0x28:0xc6fe5950
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 639 (scsi_target)
trap number             = 12
panic: page fault
cpuid = 4
Uptime: 51s
Physical memory: 3767 MB
Dumping 102 MB: 87 71 55 39 23 7

Reading symbols from /boot/kernel/ispfw.ko...Reading symbols from 
/boot/kernel/ispfw.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ispfw.ko
Reading symbols from /boot/kernel/acpi.ko...Reading symbols from 
/boot/kernel/acpi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/acpi.ko
#0  doadump () at pcpu.h:196
196             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:196
#1  0xc05c4e87 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc05c5159 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc08258bc in trap_fatal (frame=0xc6fe58c0, eva=4) at 
/usr/src/sys/i386/i386/trap.c:950
#4  0xc0825b20 in trap_pfault (frame=0xc6fe58c0, usermode=0, eva=4) at 
/usr/src/sys/i386/i386/trap.c:863
#5  0xc08264d9 in trap (frame=0xc6fe58c0) at 
/usr/src/sys/i386/i386/trap.c:541
#6  0xc080a1db in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#7  0xc04f0a66 in isp_pci_dmasetup (isp=0xc71de000, csio=0xc73bd400, 
rq=0xc6fe59c4, nxtip=0xc6fe5a0c, optr=1) at 
/usr/src/sys/dev/isp/isp_pci.c:2781
#8  0xc04e96a1 in isp_action (sim=0xc7198e00, ccb=0xc73bd400) at 
/usr/src/sys/dev/isp/isp_freebsd.c:1373
#9  0xc0449104 in xpt_run_dev_sendq (bus=0xc71d65c0) at 
/usr/src/sys/cam/cam_xpt.c:3894
#10 0xc04495ce in xpt_action (start_ccb=0xc73bd400) at 
/usr/src/sys/cam/cam_xpt.c:3056
#11 0xc0466ee6 in targsendccb (softc=0xc744ee00, ccb=0xc73bd400, 
descr=0xc7b80020) at /usr/src/sys/cam/scsi/scsi_target.c:787
#12 0xc0467027 in targstart (periph=0xc71cc700, start_ccb=0xc73bd400) at 
/usr/src/sys/cam/scsi/scsi_target.c:654
#13 0xc044dd1d in xpt_run_dev_allocq (bus=0xc71d65c0) at 
/usr/src/sys/cam/cam_xpt.c:3765
#14 0xc044e0ad in xpt_schedule (perph=0xc71cc700, new_priority=1) at 
/usr/src/sys/cam/cam_xpt.c:3665
#15 0xc04684f4 in targwrite (dev=0xc7681000, uio=0xc6fe5c60, ioflag=0) 
at /usr/src/sys/cam/scsi/scsi_target.c:599
#16 0xc0586359 in giant_write (dev=0xc7681000, uio=0xc6fe5c60, ioflag=0) 
at /usr/src/sys/kern/kern_conf.c:434
#17 0xc054cbde in devfs_write_f (fp=0xc7631b94, uio=0xc6fe5c60, 
cred=0xc7681600, flags=0, td=0xc7889240) at 
/usr/src/sys/fs/devfs/devfs_vnops.c:1446
#18 0xc05ff917 in dofilewrite (td=0xc7889240, fd=4, fp=0xc7631b94, 
auio=0xc6fe5c60, offset=-1, flags=0) at file.h:257
#19 0xc05ffbf8 in kern_writev (td=0xc7889240, fd=4, auio=0xc6fe5c60) at 
/usr/src/sys/kern/sys_generic.c:402
#20 0xc05ffc6f in write (td=0xc7889240, uap=0xc6fe5cfc) at 
/usr/src/sys/kern/sys_generic.c:318
#21 0xc0825e75 in syscall (frame=0xc6fe5d38) at 
/usr/src/sys/i386/i386/trap.c:1101
#22 0xc080a240 in Xint0x80_syscall () at 
/usr/src/sys/i386/i386/exception.s:262
#23 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)

Platform is a pair of HP DL580-G3 servers, quad 2.8GHz Xeon CPU's with 4 
gigs of ram in each (x86-32/i386, not x86-64/amd64). I've tried this 
with and without the device.hints options, all resulting in a core dump 
on the target and a hang on the initiator until the card in the target 
gets reset on reboot.

Any thoughts would be great. I'd like to get a SQL server up on these FC 
cards. I understand I could use iSCSI, but the powers that be have 
requested FC.

-- 
Erich M. Jenkins
Fuujin Group Limited

"You should never, never doubt what no one is sure about."
-- Gene Wilder


More information about the freebsd-scsi mailing list