Re: drm panic after new world

From: Steve Kargl <kargls_at_comcast.net>
Date: Sat, 07 Jun 2025 02:29:27 UTC
On 6/5/25 17:18, Bjoern A. Zeeb wrote:
> On Thu, 5 Jun 2025, Steve Kargl wrote:
> 
>> On Thu, Jun 05, 2025 at 08:22:45AM +0000, Bjoern A. Zeeb wrote:
>>>
>>> Sorting away wireless changes from sys/compat/..  here's what's left:
>>>
>>> % git log --oneline adc33d3288..8d136fb027 sys/compat/ | grep -v 
>>> 802.11 | grep -v skbuff | grep -v wsum | grep -v ASMEDIA
>>> 325aa4dbd10d linuxkpi: Introduce a properly typed jiffies        << 
>>> jiffies changed to proper type
>>> 8b51cd07f69e LinuxKPI: define time64_t                    << new typedef
>>> 28efbf9d2f67 LinuxKPI: add dummy header file linux/unaligned.h        
>>> << empty header file
>>> e29d72ac3ddd LinuxKPI: pci: add pci_info()                << new 
>>> macro for logging
>>> f94d7319540b LinuxKPI: sysfs: implement sysfs_match_string()        
>>> << new macro/func
>>> 6841b9987e83 LinuxKPI: add container_of_const()                << new 
>>> macro
>>> 69880fede78f LinuxKPI: extend struct and enum for leds            << 
>>> LED additions to struct/enum (unused)
>>> 059136a95aca LinuxKPI: add cleanup.h to mutex.h                << 
>>> #include added
>>> 15581af7c2d3 exec: Remove parameter 'segflg' from 
>>> exec_copyin_args()    << linuxolator
>>> 97f3a1565d88 linuxkpi: use iterator in zap_vma_ptes            << VM
>>
>> Well, my first attempt was at 6c3a4b5fab, which happens to
> 
> That's not a valid hash.
> 
>> include all of the above commits.  I misread the list as
>> oldest to newest.   Boot system, kldload radeonkms.ko,
>> and startx laeds to a panic.  The dump_stack() in evergreen.c
>> occurs twice.
>>
>> Just completed rebuilding everything at e1f3f15192c.  This is
>> the hash tag for the commit prior to 97f3a1565d88 from above.
>> This boots up, I kldload radeonkms.ko, and startx brings up
>> the expected desktop.
> 
> Do I understand you correctly that the change before 97f3a1565d88 (VM
> changes) worked but 97f3a1565d88 fails?
> 
> Or only the former is true and the latter we do not know?
> We only know with all of the above it's kaputt but before it's fine?
> 
> 
>> Looking at dmesg, I see
>>
>> ...
>> drmn0: radeon: MSI limited to 32-bit
>> drmn0: radeon: using MSI.
>> [drm] radeon: irq initialized.
>> #0 0xffffffff808bbcfb at linux_dump_stack+0x1b
>> #1 0xffffffff82a67adc at evergreen_startup+0x15ec
>> #2 0xffffffff82a67fb6 at evergreen_init+0x276
>> #3 0xffffffff82abdc35 at radeon_device_init+0x835
>> #4 0xffffffff82aceb4e at radeon_driver_load_kms+0x19e
>>
>>
>> and no other mentions of evergreen.c.  IOW, initialization
>> appears to occur once.
> 
> That sounds good.  So no resume path.  I almost fear your problem is
> outside LinuxKPI but we'll see.
> 
> I'd try 28efbf9d2f67 next to see what happens if I got you correctly.
> 

I've narrowed the range to

Good: 2025-04-24 9b2a503a1179 - main - e6000sw: add support for 88E6190X
  Bad: 2025-04-29 6c3a4b5f9b7b - main - alloca.3: move to share/man/man3

Good means boots and startx does not panic.  dump_stack() in evergreen.c
reports one initialization event.

Bad means boots and startx panics. dump_stack() in evergreen.c shows 
that radeon is trying to initialized twice.

The commit 28efbf9d2f67 is within this range, but I've run out of time
until monday.

-- 
steve