Re: drm panic after new world

From: Bjoern A. Zeeb <bzeeb-lists_at_lists.zabbadoz.net>
Date: Tue, 03 Jun 2025 10:08:00 UTC
On Thu, 29 May 2025, Steve Kargl wrote:

> On Thu, May 29, 2025 at 01:06:22PM -0700, Steve Kargl wrote:
>> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
>> 57		__asm("movq %%gs:%c1,%0" : "=r" (td)
>> (kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
>>         td = <optimized out>
>
> (snip)
>
>> #5  0xffffffff805c8718 in pfs_add_node (
>>     parent=parent@entry=0xfffff80003955400, pn=pn@entry=0xfffff803557e0900)
>>     at /usr/src/sys/fs/pseudofs/pseudofs.c:123
>>         iter = <optimized out>
>
> This is hitting a KASSERT under the INVARIANTS option.

Yes, and once again pretty useless information.  I am adding name and
type to it so we get better ideas right away just from the panic string.

Thankfully the name is not optimized out in frame #7: 'radeon_ring_gfx'

>> #6  0xffffffff805c8bd2 in pfs_create_file (parent=0xfffff80003955400,
>>     name=name@entry=0xffffffff82b293f4 "radeon_ring_gfx",
>>     fill=0xffffffff82bf70f0 <debugfs_fill>,
>>     attr=0xffffffff82bf72f0 <debugfs_attr>, vis=vis@entry=0x0,
>>     destroy=0xffffffff82bf7310 <debugfs_destroy>, flags=33)
>>     at /usr/src/sys/fs/pseudofs/pseudofs.c:266
>>         pn = 0xfffff803557e0900
>> #7  0xffffffff82bf70b8 in debugfs_create_file (
>>     name=0xffffffff82b293f4 "radeon_ring_gfx", mode=292,
>>     parent=0xfffff8000398e400, data=0xfffffe012354dd30,
>>     fops=0xffffffff82b55918 <radeon_debugfs_ring_info_fops>)
>>     at /usr/src/sys/compat/lindebugfs/lindebugfs.c:209

There were changes to that adding a new function or using __func__
in the timeframe you mention.

But could also be that CONFIG_DEBUG_FS was turned on somewhere which was
not before or it's because you are running a debug kernel instaed of a
no-debug?


>>         dm = 0xfffff80003990580
>>         dnode = 0xfffff80003990580
>>         pnode = <unavailable>
>>         flags = <optimized out>
>>         _size = <optimized out>
>>         _malloc_item = <optimized out>
>> #8  0xffffffff82ad0084 in radeon_ring_init () from /boot/modules/radeonkms.ko
>> No symbol table info available.
>
> How does one get kernel debugging symbols into radeonkms.ko?

I think if you do the buildkernel/installkernel with
LOCAL_MODULES_DIR=/path/to/drm/sources  they are likely to be there in
the right place.  I don't know how this works with ports but also not my
area of expertise.


Looking at 6.6 sources:

My suspicion is given the path is reset/resume and that calls
radeon_ring_init() for the RADEON_RING_TYPE_GFX_INDEX, that the original init
path likely did the same but no one cleaned things up.

#8  0xffffffff82ad0084 in radeon_ring_init () from /boot/modules/radeonkms.ko
#9  0xffffffff82a5caf7 in evergreen_startup () from /boot/modules/radeonkms.ko
#10 0xffffffff82a5b333 in evergreen_resume () from /boot/modules/radeonkms.ko
#11 0xffffffff82ab3e90 in radeon_gpu_reset () from /boot/modules/radeonkms.ko

The evergreen_startup() function doing the call ..

    5083         ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX];
    5084         r = radeon_ring_init(rdev, ring, ring->ring_size, RADEON_WB_CP_RPTR_OFFSET,
    5085                              RADEON_CP_PACKET2);

.. is called from evergreen_resume() and evergreen_init().

Would be interesting to know when and how often you pass these functions
during boot before panic.

You could try adding a dump_stack() there and the message buffer from
the core file should likely tell us:

% git diff
diff --git drivers/gpu/drm/radeon/evergreen.c drivers/gpu/drm/radeon/evergreen.c
index eedb7dec0f..a6ae0cd9c4 100644
--- drivers/gpu/drm/radeon/evergreen.c
+++ drivers/gpu/drm/radeon/evergreen.c
@@ -5080,6 +5080,8 @@ static int evergreen_startup(struct radeon_device *rdev)
         }
         evergreen_irq_set(rdev);

+       dump_stack();
+
         ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX];
         r = radeon_ring_init(rdev, ring, ring->ring_size, RADEON_WB_CP_RPTR_OFFSET,
                              RADEON_CP_PACKET2);


-- 
Bjoern A. Zeeb                                                     r15:7