[Bug 273289] panic on removal of SAS drive

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 22 Aug 2023 13:49:24 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273289

            Bug ID: 273289
           Summary: panic on removal of SAS drive
           Product: Base System
           Version: 13.2-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: jfc@mit.edu

I removed an SAS SSD and the system crashed with message

panic: free: called with spinlock or critical section held

The erroneous free is in pqisrc_device_mem_free.  My kernel is based on
4c22848d1a7e3fc996adc0cb71e35d7be8b26ffb except I have INVARIANTS enabled.

The drive identifies as SEAGATE XS960SE70004 (960 GB SAS SSD).  It holds a ZFS
pool which I exported before removing the drive.

The system is an HPE Proliant DL 325 Gen 10 with the controller below:

ses0: <HPE Smart Adapter 1.99> Fixed Enclosure Services SPC-3 SCSI device
ses0: 1200.000MB/s transfers
ses0: SES Device
ses0: da0,pass0 in 'ArrayElement0000', SAS Slot: 1 phys at slot 1
...
ses0: da7,pass7 in 'ArrayElement0007', SAS Slot: 1 phys at slot 8
ses0:  phy 0: SAS device type 1 phy 7 Target ( SSP )
ses0:  phy 0: parent 51402ec013d6a5b4 addr 5000c5003e85f2bd

I removed and reinserted da7.  The panic appears to have been triggered by
removal.

Crash dump information follows.

Unread portion of the kernel message buffer:
[INFO]:[ pqisrc_display_device_info ] [ 324 ]removed scsi BTL 0:71:0:  SEAGATE 
XS960SE70004     Physical     SSDSmartPathCap- En- Exp+ qd=65535
[INFO]:[ pqisrc_remove_device ] [ 1302 ]vendor: SEAGATE XS960SE70004     model:
XS960SE70004     bus:0 target:71 lun:0 is_physical_device:0x1 expose_device:0x1
volume_offline 0x0 volume_status 0x0 
[INFO]:[ pqisrc_wait_for_device_commands_to_complete ] [ 515 ]Device
Outstanding IO count = 0
panic: free: called with spinlock or critical section held
cpuid = 11
time = 1692710548
KDB: stack backtrace:
#0 0xffffffff80c19e05 at kdb_backtrace+0x65
#1 0xffffffff80bcf112 at vpanic+0x152
#2 0xffffffff80bcef13 at panic+0x43
#3 0xffffffff80ba4b5f at free+0xcf
#4 0xffffffff811247ee at pqisrc_free_device+0x16e
#5 0xffffffff811210ce at os_remove_device+0x7e
#6 0xffffffff81125a3f at pqisrc_scan_devices+0xe7f
#7 0xffffffff8112736d at pqisrc_ack_all_events+0x16d
#8 0xffffffff80c2e87b at taskqueue_run_locked+0xab
#9 0xffffffff80c2e78d at taskqueue_run+0x4d
#10 0xffffffff80b8c9e6 at ithread_loop+0x256
#11 0xffffffff80b89910 at fork_exit+0x80
#12 0xffffffff8105f5ee at fork_trampoline+0xe
Uptime: 20d2h33m19s
Dumping 11245 out of 98100 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/home/jfc/freebsd/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
pcpu,
(kgdb) #0  __curthread ()
    at /usr/home/jfc/freebsd/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>)
    at /usr/home/jfc/freebsd/src/sys/kern/kern_shutdown.c:396
#2  0xffffffff80bced22 in kern_reboot (howto=260)
    at /usr/home/jfc/freebsd/src/sys/kern/kern_shutdown.c:484
#3  0xffffffff80bcf17f in vpanic (
    fmt=0xffffffff811f6ecd "free: called with spinlock or critical section
held", ap=ap@entry=0xfffffe01b7549a00)
    at /usr/home/jfc/freebsd/src/sys/kern/kern_shutdown.c:923
#4  0xffffffff80bcef13 in panic (fmt=<unavailable>)
    at /usr/home/jfc/freebsd/src/sys/kern/kern_shutdown.c:847
#5  0xffffffff80ba4b5f in free_dbg (mtp=0xffffffff81b49030 <M_SMARTPQI>, 
    addrp=<optimized out>)
    at /usr/home/jfc/freebsd/src/sys/kern/kern_malloc.c:866
#6  free (addr=addr@entry=0xfffff80104da9700, 
    mtp=0xffffffff81b49030 <M_SMARTPQI>)
    at /usr/home/jfc/freebsd/src/sys/kern/kern_malloc.c:904
#7  0xffffffff8112cef4 in os_mem_free (softs=softs@entry=0xfffffe01b852a000, 
    addr=<unavailable>, addr@entry=0xfffff80104da9700 "", size=<unavailable>, 
    size@entry=184)
    at /usr/home/jfc/freebsd/src/sys/dev/smartpqi/smartpqi_mem.c:192
#8  0xffffffff811247ee in pqisrc_device_mem_free (softs=0xfffffe01b852a000, 
    device=0xfffff80104da9700)
    at /usr/home/jfc/freebsd/src/sys/dev/smartpqi/smartpqi_discovery.c:1432
#9  pqisrc_free_device (softs=softs@entry=0xfffffe01b852a000, 
    device=device@entry=0xfffff80104da9700)
    at /usr/home/jfc/freebsd/src/sys/dev/smartpqi/smartpqi_discovery.c:1464
#10 0xffffffff811210ce in os_remove_device (softs=0xfffffe01b852a000, 
    device=0xfffff80104da9700)
    at /usr/home/jfc/freebsd/src/sys/dev/smartpqi/smartpqi_cam.c:152
#11 0xffffffff81124673 in pqisrc_remove_device (softs=0x99d1f44cbec872bb, 
    softs@entry=0xfffffe01b852a000, device=<unavailable>, 
    device@entry=0xfffff80104da9700)
    at /usr/home/jfc/freebsd/src/sys/dev/smartpqi/smartpqi_discovery.c:1317
#12 0xffffffff81125a3f in pqisrc_update_device_list (
    softs=0xfffffe01b852a000, new_device_list=0xfffff802f4277b80, 
    num_new_devices=9)
    at /usr/home/jfc/freebsd/src/sys/dev/smartpqi/smartpqi_discovery.c:1597
#13 pqisrc_scan_devices (softs=softs@entry=0xfffffe01b852a000)
    at /usr/home/jfc/freebsd/src/sys/dev/smartpqi/smartpqi_discovery.c:1992
#14 0xffffffff8112736d in pqisrc_rescan_devices (softs=0xfffffe01b852a000)
    at /usr/home/jfc/freebsd/src/sys/dev/smartpqi/smartpqi_event.c:42
#15 pqisrc_ack_all_events (arg1=0xfffffe01b852a000)
    at /usr/home/jfc/freebsd/src/sys/dev/smartpqi/smartpqi_event.c:123
#16 0xffffffff80c2e87b in taskqueue_run_locked (
    queue=queue@entry=0xfffff8010191be00)
    at /usr/home/jfc/freebsd/src/sys/kern/subr_taskqueue.c:514
#17 0xffffffff80c2e78d in taskqueue_run (queue=0xfffff8010191be00)
    at /usr/home/jfc/freebsd/src/sys/kern/subr_taskqueue.c:529
#18 0xffffffff80b8c9e6 in intr_event_execute_handlers (ie=0xfffff8010191bd00, 
    p=<optimized out>) at /usr/home/jfc/freebsd/src/sys/kern/kern_intr.c:1169
#19 ithread_execute_handlers (ie=0xfffff8010191bd00, p=<optimized out>)
    at /usr/home/jfc/freebsd/src/sys/kern/kern_intr.c:1182
#20 ithread_loop (arg=arg@entry=0xfffff80101964c60)
    at /usr/home/jfc/freebsd/src/sys/kern/kern_intr.c:1270
#21 0xffffffff80b89910 in fork_exit (
    callout=0xffffffff80b8c790 <ithread_loop>, arg=0xfffff80101964c60, 
    frame=0xfffffe01b7549f40)
    at /usr/home/jfc/freebsd/src/sys/kern/kern_fork.c:1094
#22 <signal handler called>

-- 
You are receiving this mail because:
You are the assignee for the bug.