iur crash ? -- long

dave baukus dbaukus at chiaro.com
Mon Jun 27 20:38:26 GMT 2005


I have a crash on BSD4.10 w/ a heavily modified network stack, but
the disk/scsi subsystem is ostensively unmodified.

I'm reasonably certain that the crash is caused by
iir_intr(void *arg) passing 0xcOdedead to bcopy() as a length.
We have INVARIANTS enabled.

The quick question is: has this been seen ?

The long question goes like this:
----------------------
Here's the stack trace:
#0  dumpsys () at ../../kern/kern_shutdown.c:519
#1  0xc01e7253 in boot (howto=0x100) at ../../kern/kern_shutdown.c:331
#2  0xc01e76cb in panic (fmt=0xc04cb120 "vm_fault: fault on nofault entry, addr: %lx")
     at ../../kern/kern_shutdown.c:635
#3  0xc03c810c in vm_fault (map=0xc059fa78, vaddr=0xf5315000, fault_type=0x1, fault_flags=0x0)
     at ../../vm/vm_fault.c:240
#4  0xc041ca5e in trap_pfault (frame=0xf8225d90, usermode=0x0, eva=0xf5315000) at ../../i386/i386/trap.c:921
#5  0xc041c5e0 in trap (frame={tf_fs = 0xc01e0010, tf_es = 0xffff0010, tf_ds = 0xcb3c0010, tf_edi = 0xcbaeb52e,
       tf_esi = 0xf5315000, tf_ebp = 0xf8225e0c, tf_isp = 0xf8225dbc, tf_ebx = 0xc0dedead, tf_edx = 0xf5222b20,
       tf_ecx = 0x3033ee73, tf_eax = 0xd67d652e, tf_trapno = 0xc, tf_err = 0x0, tf_eip = 0xc041b502, tf_cs = 0x8,
       tf_eflags = 0x10202, tf_esp = 0xcadf9000, tf_ss = 0xcb9f9000}) at ../../i386/i386/trap.c:500
#6  0xc041b502 in generic_bcopy ()
#7  0xc04252fd in intr_mux (arg=x) at ../../i386/isa/intr_machdep.c:609
#8  0xc040e74e in vec9 ()
#9  0xc01df10b in exit1 (p=0xf80c88a0, rv=0xf) at ../../kern/kern_exit.c:225
#10 0xc01e9262 in sigexit (p=0xf80c88a0, sig=0xf) at ../../kern/kern_sig.c:1519
#11 0xc01e8fa4 in postsig (sig=0xf) at ../../kern/kern_sig.c:1422
#12 0xc041d254 in syscall2 (frame={tf_fs = 0xbfbf002f, tf_es = 0x8fe002f, tf_ds = 0xbfbf002f, tf_edi = 0xbfbff800,
       tf_esi = 0x8fe5540, tf_ebp = 0xbfbff800, tf_isp = 0xf8225fd4, tf_ebx = 0x64, tf_edx = 0xbfbff780,
       tf_ecx = 0xbfbff700, tf_eax = 0x4, tf_trapno = 0x7, tf_err = 0x2, tf_eip = 0x88fe6ef4, tf_cs = 0x1f,
       tf_eflags = 0x203, tf_esp = 0xbfbff604, tf_ss = 0x2f}) at ../../i386/i386/trap.c:177

--------------------------------
Since intr_mux() does not call generic_bcopy()/bcopy(), the stack frames must be a mangled.
The frames 4 to 7 look like:

0xf8225d10:     0xf8225d40      0xc01eabd3      0xffffffff
                                                                 frame 4
                                                                 0xf8225d44
                 trap_pfault
0xf8225d20:     0xc041ca5e      0xc059fa78      0xf5315000      0x00000001
0xf8225d30:     0x00000000      0x0000000c      0xf80c88a0      0xf5315000
                                 frame 5         trap
0xf8225d40:     0xcb3cdf01      0xf8225d88      0xc041c5e0      0xf8225d90
0xf8225d50:     0x00000000      0xf5315000      0x006c0200      0xf5315000
0xf8225d60:     0xcbaeb52e      0x00000000      0xc0dedead      0x00000000
0xf8225d70:     0xcbae4e5a      0xf8225d80      0xc0421783      0xc04217e9
                                                                 calltrap
0xf8225d80:     0xf8225e0c      0xc040dcb4      0xf8225e0c      0xc040d0c0
0xf8225d90:     0xc01e0010      0xffff0010      0xcb3c0010      0xcbaeb52e
0xf8225da0:     0xf5315000      0xf8225e0c      0xf8225dbc      0xc0dedead
0xf8225db0:     0xf5222b20      0x3033ee73      0xd67d652e      0x0000000c
0xf8225dc0:     0x00000000      0xc041b502      0x00000008      0x00010202
0xf8225dd0:     0xcadf9000      0xcb9f9000      0xc01b7e11      0xf5222b20
0xf8225de0:     0xcb9f904e      0xc0dedead      0xc6048460      0x00400200
0xf8225df0:     0x00000000      0xf5222b20      0x006c0200      0x00000000
0xf8225e00:     0x00000000      0x00090001      0x006fc67b      frame 7
                                                                 0xf8225e24
                 intr_mux
0xf8225e10:     0xc04252fd      0xcadf9000      0x006c0200      0x00000000
0xf8225e20:     0xc2ee216c      0xf8225e90      0xc040e1d7      0xc6048460

------------------------------------
It's between frame 7 (intr_mux()) and  frame 5 (trap()), that I begin
guessing at the sequence of events.

Based on the 0xcadf9000 at 0xf8225e14 I speculate that iir_intr was the
last interrupt routine called.
Here is the intrec * list passed to intr_mux()

set $P=(intrec *)0xc6048460
(kgdb) intrecwalk $P
$186 = {mask = 0x6c0200, handler = 0xc01b7b90 <iir_intr>, argument = 0xcadf9000, next = 0xc60483e0,
   name = 0xcadf3b80 "iir0", intr = 0x9, maskptr = 0xc0552b54, flags = 0x0}
$187 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 0xcadfc000, next = 0xc6048260,
   name = 0xcadf3a60 "em0", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
$188 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 0xcadfd000, next = 0xcae01e60,
   name = 0xcadf3950 "em1", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
$189 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 0xcadff000, next = 0xcae01ce0,
   name = 0xcadf3810 "em2", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
$190 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 0xcae03000, next = 0xcae01ba0,
   name = 0xcadf3700 "em3", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
$191 = {mask = 0x630212, handler = 0xc03b6570 <ics_intr>, argument = 0x0, next = 0xcae01b00, name = 0xcadf3620 "ics0",
   intr = 0x9, maskptr = 0xc0552b48, flags = 0x0}
$192 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 0xcae04000, next = 0xcae01a00,
   name = 0xcadf3520 "em4", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
$193 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 0xcae06000, next = 0xcae01880,
   name = 0xcadf3410 "em5", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
$194 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 0xcae07000, next = 0xcae01760,
   name = 0xcadf3300 "em6", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
$195 = {mask = 0x68c640, handler = 0xc03d74b0 <uhci_intr>, argument = 0xcae09000, next = 0x0,
   name = 0xcadf32c0 "uhci0", intr = 0x9, maskptr = 0xc0552b4c, flags = 0x0}

-----------------------------------------------
Since I think its iir_intr that was called, I poke around in the stack frames
between frame 7 and 5.
At 0xf8225dd8 I see the value 0xc01b7e11

(kgdb) x 0xc01b7e11
0xc01b7e11 <iir_intr+641>:      0x830cc483

(kgdb) disass <iir_intr+641>:
...
...
...
0xc01b7dff <iir_intr+623>:      test   %ebx,%ebx
0xc01b7e01 <iir_intr+625>:      je     0xc01b7e14 <iir_intr+644>
0xc01b7e03 <iir_intr+627>:      push   %ebx
0xc01b7e04 <iir_intr+628>:      lea    0x4e(%esi),%eax
0xc01b7e07 <iir_intr+631>:      push   %eax
0xc01b7e08 <iir_intr+632>:      mov    0xffffffe8(%ebp),%edx
0xc01b7e0b <iir_intr+635>:      push   %edx
0xc01b7e0c <iir_intr+636>:      call   0xc041b4d8 <bcopy>
0xc01b7e11 <iir_intr+641>:      add    $0xc,%esp
0xc01b7e14 <iir_intr+644>:      cmpl   $0x0,0x42(%esi)

-----------------------------------------------------
Therefore, 0xcadf9000 is the struct gdt_softc * argument to iir_intr()


(kgdb) set $SC=(struct gdt_softc *)0xcadf9000
(kgdb) p *$SC
$217 = {sc_hanum = 0x0, sc_class = 0x5, sc_bus = 0x4, sc_slot = 0x8, sc_device = 0x600, sc_subdevice = 0x1af,
   sc_fw_vers = 0x22a, sc_init_level = 0x6, sc_state = 0x0, sc_dev = 0xcadf4000, sc_dpmemt = 0x1,
   sc_dpmemh = 0xf31c7000, sc_dpmembase = 0xf8000000, sc_parent_dmat = 0xcadfad00, sc_buffer_dmat = 0xcadfacc0,
   sc_gccb_dmat = 0xcadfac80, sc_gccb_dmamap = 0x0, sc_gccb_busbase = 0x1d000, sc_gccbs = 0xf51c7000, sc_free_gccb = {
     slh_first = 0xf51e1860}, sc_pending_gccb = {slh_first = 0xf5211440}, sc_ccb_queue = {tqh_first = 0x0,
     tqh_last = 0xcadf9050}, sc_ucmd_queue = {tqh_first = 0x0, tqh_last = 0xcadf9058}, sc_ic_all_size = 0x2fc0,
   sc_cmd_len = 0x24, sc_cmd_off = 0x24, sc_cmd_cnt = 0x1,
   sc_cmd = "\000\000\000\000d\000\000\000\002\000\000\000¿#\235\000\200\000\000\000ÿÿÿÿ\001\000\000\000\000\000\006â\000\000\001", '\000' <repeats 288 times>, sc_info = 0x0, sc_info2 = 0x0, sc_status = 0x1000, sc_service = 0x0,
   sc_bus_cnt = 0x3, sc_virt_bus = 0x2, sc_bus_id = "\a\a\000\000\000", sc_more_proc = 0x0, sc_hdr = {{
       hd_present = 0x1, hd_is_logdrv = 0x0, hd_is_arraydrv = 0x0, hd_is_master = 0x0, hd_is_parity = 0x0,
       hd_is_hotfix = 0x0, hd_master_no = 0x0, hd_lock = 0x0, hd_heads = 0xff, hd_secs = 0x3f, hd_devtype = 0x0,
       hd_size = 0x88efe6a, hd_ldr_no = 0x0, hd_rw_attribs = 0x0, hd_start_sec = 0x0}, {hd_present = 0x0,
       hd_is_logdrv = 0x0, hd_is_arraydrv = 0x0, hd_is_master = 0x0, hd_is_parity = 0x0, hd_is_hotfix = 0x0,
       hd_master_no = 0x0, hd_lock = 0x0, hd_heads = 0x0, hd_secs = 0x0, hd_devtype = 0x0, hd_size = 0x0,
       hd_ldr_no = 0x0, hd_rw_attribs = 0x0, hd_start_sec = 0x0} <repeats 99 times>}, sc_raw_feat = 0x1,
   sc_cache_feat = 0x101, sc_dvr = {size = 0x0, eu = {stream = '\000' <repeats 15 times>, driver = {ionode = 0x0,
         service = 0x0, index = 0x0}, async = {ionode = 0x0, service = 0x0, status = 0x0, info = 0x0,
         scsi_coord = "\000\000"}, sync = {ionode = 0x0, service = 0x0, status = 0x0, info = 0x0, hostdrive = 0x0,
         scsi_coord = "\000\000", sense_key = 0x0}, test = {l1 = 0x0, l2 = 0x0, l3 = 0x0, l4 = 0x0}}, severity = 0x0,
     event_string = '\000' <repeats 255 times>}, sims = {0xcadfac00, 0xcadfab40, 0xcadfaa80, 0x0, 0x0, 0x0}, paths = {
     0xcadf3c20, 0xcadf3bf0, 0xcadf3bc0, 0x0, 0x0, 0x0}, sc_copy_cmd = 0xc01b90d4 <gdt_mpr_copy_cmd>,
   sc_get_status = 0xc01b9190 <gdt_mpr_get_status>, sc_intr = 0xc01b91b4 <gdt_mpr_intr>,
   sc_release_event = 0xc01b92d0 <gdt_mpr_release_event>, sc_set_sema0 = 0xc01b92f0 <gdt_mpr_set_sema0>,
   sc_test_busy = 0xc01b9310 <gdt_mpr_test_busy>, links = {tqe_next = 0x0, tqe_prev = 0xc04ffe80}}

-----------------------------------------------------
Now I try to figure which iir_intr() code path was executed.
Only the case GDT_GCF_IOCTL: code path leads to a bcopy().

I walked all the struct gdt_ccb * in gdt->sc_gccbs[],
Only 1 has a  non-zero  gccb->gc_flags value; its
value is 4 (GDT_GCF_IOCTL)

(kgdb) set $SCBS=(struct gdt_ccb *)&$SC->sc_gccbs[121]
(kgdb) p *$SCBS
$218 = {gc_scratch = "\001\000\0013", '\000' <repeats 3067 times>, gc_ccb = 0xcb16c400, gc_ucmd = 0xcb9f9000,
   gc_dmamap = 0x0, gc_map_flag = 0x1, gc_timeout = 0x0, gc_state = 0x0, gc_service = 0x9, gc_cmd_index = 0x7b,
   gc_flags = 0x4, sle = {sle_next = 0x0}}

---------------------------------------
Down to the bcopy():
the bcopy() decission is made off of values in gc_ucmd,
and nothing good can come from using most of these values:

(kgdb) set $UCMD=(gdt_ucmd_t *)$SCBS->gc_ucmd
(kgdb) p *$UCMD
$219 = {io_node = 0xc0de, service = 0xdead, timeout = 0xc05076a0, status = 0x1, info = 0x0, BoardNode = 0xc0ded8b2,
   CommandIndex = 0xc0dedead, OpCode = 0xdead, u = {cache = {DeviceNo = 0xc0de, BlockNo = 0xc0dedead,
       BlockCnt = 0xc0dedead, DestAddr = 0xc0dedead}, ioctl = {param_size = 0xc0de, subfunc = 0xc0dedead,
       channel = 0xc0dedead, p_param = 0xc0dedead}, raw = {reserved = 0xc0de, direction = 0xc0dedead,
       mdisc_time = 0xc0dedead, mcon_time = 0xc0dedead, sdata = 0xc0dedead, sdlen = 0xc0dedead, clen = 0xc0dedead,
       cmd = "­ÞÞÀ­ÞÞÀ­ÞÞÀ", target = 0xad, lun = 0xde, bus = 0x1, priority = 0x0, sense_len = 0x0, sense_data = 0x0,
       link_p = 0x10}}, data = "\001\000\0013", '\000' <repeats 3067 times>, complete_flag = 0xcb16c400, links = {
     tqe_next = 0xcb9f9000, tqe_prev = 0x0}}

----------------------------
Not knowing anything about iir/scsi, it appears to me
that gdt->sc_gccbs[121]->gc_ucmd has been freed and yet is still referenced and
in use.

How is this  ddt_ucmd_t * gc_ucmd data managed ?
Is it actively malloc()ed and free()d ?

Any clues or pointers will be appreciated.


-- 
Dave Baukus
dbaukus at chiaro.com			
	Chiaro Networks Ltd.
	Richardson, Texas
	USA



More information about the freebsd-scsi mailing list