Re: kernel panic while copying files

From: Gary Jennejohn <gljennjohn_at_gmail.com>
Date: Thu, 10 Jun 2021 09:50:41 UTC
On Tue, 8 Jun 2021 17:54:05 +0200
Gary Jennejohn <gljennjohn@gmail.com> wrote:

[big snip]
> Here's the kgdb backtrace with the -O0 kernel:
> 
> (kgdb) bt
> #0  0xffffffff8081d706 in doadump (textdump=0)
>     at /usr/src/sys/kern/kern_shutdown.c:398
> #1  0xffffffff804ef15a in db_dump (dummy=-2138500043, dummy2=false, dummy3=-1,
>     dummy4=0xfffffe00c62a11b0 "") at /usr/src/sys/ddb/db_command.c:575
> #2  0xffffffff804eef5f in db_command (
>     last_cmdp=0xffffffff8114d380 <db_last_command>, cmd_table=0x0, dopager=1)
>     at /usr/src/sys/ddb/db_command.c:482
> #3  0xffffffff804eeb38 in db_command_loop ()
>     at /usr/src/sys/ddb/db_command.c:535
> #4  0xffffffff804f38ef in db_trap (type=3, code=0)
>     at /usr/src/sys/ddb/db_main.c:270
> #5  0xffffffff80891d02 in kdb_trap (type=3, code=0, tf=0xfffffe00c62a1680)
>     at /usr/src/sys/kern/subr_kdb.c:727
> #6  0xffffffff80dd53c3 in trap (frame=0xfffffe00c62a1680)
>     at /usr/src/sys/amd64/amd64/trap.c:604
> #7  0xffffffff80dd6718 in trap_check (frame=0xfffffe00c62a1680)
>     at /usr/src/sys/amd64/amd64/trap.c:664
> #8  <signal handler called>
> #9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
> #10 0xffffffff808910d0 in kdb_enter (why=0xffffffff80eaaf0b "panic",
>     msg=0xffffffff80eaaf0b "panic") at /usr/src/sys/kern/subr_kdb.c:505
> #11 0xffffffff8081dbfe in vpanic (
>     fmt=0xffffffff80e80f73 "Duplicate free of %p from zone %p(%s) slab %p(%d)", ap=0xfffffe00c62a1850) at /usr/src/sys/kern/kern_shutdown.c:906
> #12 0xffffffff8081d6b0 in panic (
>     fmt=0xffffffff80e80f73 "Duplicate free of %p from zone %p(%s) slab %p(%d)")
>     at /usr/src/sys/kern/kern_shutdown.c:843
> #13 0xffffffff80caaec5 in uma_dbg_free (zone=0xfffffe00dc9d9800,
>     slab=0xfffff80007ee0fd8, item=0xfffff80007ee0000)
>     at /usr/src/sys/vm/uma_core.c:5664
> #14 0xffffffff80c9faf5 in item_dtor (zone=0xfffffe00dc9d9800,
>     item=0xfffff80007ee0000, size=544, udata=0x0, skip=SKIP_NONE)
>     at /usr/src/sys/vm/uma_core.c:3418
> #15 0xffffffff80c9eec7 in uma_zfree_arg (zone=0xfffffe00dc9d9800,
>     item=0xfffff80007ee0000, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> #16 0xffffffff802e5a89 in uma_zfree (zone=0xfffffe00dc9d9800,
>     item=0xfffff80007ee0000) at /usr/src/sys/vm/uma.h:404
> #17 0xffffffff802dcfa6 in xpt_free_ccb (free_ccb=0xfffff80007ee0000)
>     at /usr/src/sys/cam/cam_xpt.c:4674
> #18 0xffffffff802db639 in camperiphdone (periph=0xfffff8005d68bd00,
>     done_ccb=0xfffff80007797cc0) at /usr/src/sys/cam/cam_periph.c:1427
> #19 0xffffffff802e59b6 in xpt_done_process (ccb_h=0xfffff80007797cc0)
>     at /usr/src/sys/cam/cam_xpt.c:5491
> #20 0xffffffff802e811e in xpt_done_td (arg=0xffffffff81143c00 <cam_doneqs>)
>     at /usr/src/sys/cam/cam_xpt.c:5546
> #21 0xffffffff807ac0ea in fork_exit (callout=0xffffffff802e7f20 <xpt_done_td>,
>     arg=0xffffffff81143c00 <cam_doneqs>, frame=0xfffffe00c62a1c00)
>     at /usr/src/sys/kern/kern_fork.c:1083
> #22 <signal handler called>
> 

So, I did ``git reset --hard 8dc96b74edb844bb621afeba38fe4af104b13120'',
which was the penultimate commit made by trasz to clear CCBs on the stack
after he committed 3394d4239b85b5577845d9e6de4e97b18d3dba58, the change
to allocate CCBs in UMA.

Note that I only built the kernel and not world.

I tried to reset to 3394d4239b85b5577845d9e6de4e97b18d3dba58 itself,
but without the following commits for CCBs on the stack the kernel
paniced during startup in AHCI.

Anyway, this is the minimum set of changes relevant to the uma_ccbs
story and also results in a panic identical to the one listed above
when I set kern.cam.da.enable_uma_ccbs=1 and turn on the external USB
disk.

So, Warner is probably right and at least the da_uma_ccbs commits
should be reverted until more research can be done on why the panic
happens.

The ada_uma_ccbs commits do not cause any problems in my experience and
could probably be left in the kernel.

-- 
Gary Jennejohn