Re: Confused about the kernel stack backtrace

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Mon, 27 Feb 2023 19:35:23 UTC
On 2/25/23 10:26 PM, Zhenlei Huang wrote:
> 
>> On Feb 24, 2023, at 11:43 PM, Rick Macklem <rick.macklem@gmail.com> wrote:
>>
>> Btw, thanks to markj@'s quick review, D38750 is now in main.
>> I'll keep an eye on the ci test results, but I suspect this is now
>> fixed.
>>
>> Sorry about the breakage, rick
> 
> No worry. I was not blaming but think this might be an issue of DDB / KDB (for the falsely reported stack trace).

Likely what happened is that the compiler moved the call to panic to the end of the function
in a "cold" section, and since panic is marked "noreturn" there wasn't an instruction after
the call to panic.  The panic stackframe still saves a return address, it just points to
the instruction after the call/branch.  However, that instruction is no longer in the mtrash_ctor
function, and if there wasn't a padding gap, it instead points to the first instruction of
the next function, in this case mtrash_dtor.  Some unwinders do try to correct for this
(e.g. I think I've patched at least one somewhere in FreeBSD) by subtracting 1 from the
return address when resolving the function symbol.  However, you'd have to undo the subtraction
to manually fixup the offset.  Most of the time it really isn't worth dealing with as the other
parts of the stack trace are sufficient to determine what's going on.
  
>>
>> On Fri, Feb 24, 2023 at 5:26 AM Zhenlei Huang <zlei@freebsd.org> wrote:
>>>
>>> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca
>>>
>>>
>>> Hi,
>>>
>>> The job FreeBSD-main-amd64-test on ci is failing, and some kernel stack backtrace [1]
>>> looks weird.
>>>
>>>> Memory modified after free 0xfffffe00ccc29000(8184) val=0 @ 0xfffffe00ccc29698
>>>> panic: Most recently used by temp
>>>
>>>> cpuid = 0
>>>> time = 1677239728
>>>> KDB: stack backtrace:
>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0084e3eaa0
>>>> vpanic() at vpanic+0x152/frame 0xfffffe0084e3eaf0
>>>> panic() at panic+0x43/frame 0xfffffe0084e3eb50
>>>> mtrash_dtor() at mtrash_dtor/frame 0xfffffe0084e3eb70
>>>> item_ctor() at item_ctor+0x11f/frame 0xfffffe0084e3ebc0
>>>> malloc() at malloc+0x7f/frame 0xfffffe0084e3ec00
>>>> g_read_data() at g_read_data+0x82/frame 0xfffffe0084e3ec40
>>>> g_use_g_read_data() at g_use_g_read_data+0x46/frame 0xfffffe0084e3ec60
>>>> readsuper() at readsuper+0x29/frame 0xfffffe0084e3ecf0
>>>> ffs_sbget() at ffs_sbget+0x84/frame 0xfffffe0084e3ed70
>>>> g_label_ufs_taste_common() at g_label_ufs_taste_common+0x8b/frame 0xfffffe0084e3edc0
>>>> g_label_taste() at g_label_taste+0x1d0/frame 0xfffffe0084e3eea0
>>>> g_new_provider_event() at g_new_provider_event+0x9a/frame 0xfffffe0084e3eec0
>>>> g_run_events() at g_run_events+0x104/frame 0xfffffe0084e3eef0
>>>> fork_exit() at fork_exit+0x80/frame 0xfffffe0084e3ef30
>>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0084e3ef30
>>>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>>>> KDB: enter: panic
>>>
>>> The source code sys/vm/uma_dbg.c shows clearly that the panic comes from `mtrash_ctor()`.
>>>
>>> Why KDB shows that the panic is from `mtrash_dtor()` ?
>>>
>>> [1] https://lists.freebsd.org/archives/dev-ci/2023-February/003055.html
>>>
>>> Best regards,
>>> Zhenlei
>>>
>>>
> 
> 
> 

-- 
John Baldwin