Re: Possible video driver issue after main-n275966-d2a55e6a9348 -> main-n275975-5963423232e8
- Reply: Gleb Smirnoff : "Re: Possible video driver issue after main-n275966-d2a55e6a9348 -> main-n275975-5963423232e8"
- In reply to: Gleb Smirnoff : "Re: Possible video driver issue after main-n275966-d2a55e6a9348 -> main-n275975-5963423232e8"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 20 Mar 2025 19:52:19 UTC
On Tue, 18 Mar 2025, Gleb Smirnoff wrote: > On Tue, Mar 18, 2025 at 08:14:31AM -0700, David Wolfskill wrote: > D> It completed successfully: > D> > D> g1-48(15.0-C)[1] uname -aUK > D> FreeBSD g1-48.catwhisker.org 15.0-CURRENT FreeBSD 15.0-CURRENT #262 main-n275998-82589f926b52: Tue Mar 18 14:17:34 UTC 2025 root@g1-48.catwhisker.org:/common/S3/obj/usr/src/amd64.amd64/sys/CANARY amd64 1500034 1500034 > D> > D> Specifically: > D> * I used the slice on the laptop where I had done the "git bisect" > D> * I first issued "git bisect reset" > D> * Then "git pull" to bring /usr/src up to main-n275998-82589f926b52 > D> * The "git revert 19df0c5abcb9d4e951e610b6de98d4d8a00bd5f9 > D> * Then the usual buildworld, kernel. installworld stuff > D> * Reboot > > This needs to be fixed ASAP, it blocks FreeBSD CURRENT usage on laptops. > > If this is not fixed by weekend, I will push revert of > 19df0c5abcb9d4e951e610b6de98d4d8a00bd5f9, to get tree in a good shape > before beginning of the stabweek. Just to follow-up on this. David has been fantastic doing kernel debugging via email in ddb> with a blank screen in front of him and got me a core dump. (*) He's hitting a ... somewhere in i915kms.ko (here's the two instances I have): REDZONE: Buffer underflow detected. 16 bytes corrupted before 0xfffffe089bc65000 (262148 bytes allocated). REDZONE: Buffer underflow detected. 16 bytes corrupted before 0xfffffe08a7e70000 (262148 bytes allocated). From what I gathered so far it is "generation specific" so depending on what chipset/model/age the graphics chip is there's different function pointers. That likely also explains why other people who tested these malloc changes have not seen this. I cannot yet say if/which are affected but I am preparing some debugging changes locally for him and am already seeing four different calls through that bit during init (module loading). I also do build drm-kmod differently to him (I use the github checkout in /usr/local/sys/ still while he's building the port along with the kernel. Also there seems to be some problem loading firmware. I assume we'll keep debugging it to a point that we can either have a fix for drm-kmod-6.1 or at least write an intelligent bug report for his case. I can't say if a non-debug kernel would "just work" by accident (it likely has for months) but these things are likely elsewhere too and the reason for the occasional stuck in X with a dead laptop (while actually sitting in ddb or gone through a panic) people have been seeing. While this one is possibly a side-effect of the commit (contigmalloc instead of malloc) the bug is elsewhere and the two changes which went in and the one further which is coming may actually help us to make drm-kmod (amonst other LinuxKPI consumers) more reliable. I would hope that some DMA problems in wireless land also go away, especially on arm64. All painful but helpful. So I see little reason to back this change out anymore at this point, but get drm-kmod fixed instead. Lots of health, Bjoern (*) we should write some of this down for people as it may help in a lot of situations. -- Bjoern A. Zeeb r15:7