Re: Possible video driver issue after main-n275966-d2a55e6a9348 -> main-n275975-5963423232e8
Date: Fri, 21 Mar 2025 13:25:17 UTC
On Fri, 21 Mar 2025, Gleb Smirnoff wrote: > On Thu, Mar 20, 2025 at 07:52:19PM +0000, Bjoern A. Zeeb wrote: > B> He's hitting a ... somewhere in i915kms.ko (here's the two instances I > B> have): > B> REDZONE: Buffer underflow detected. 16 bytes corrupted before 0xfffffe089bc65000 (262148 bytes allocated). > B> REDZONE: Buffer underflow detected. 16 bytes corrupted before 0xfffffe08a7e70000 (262148 bytes allocated). > > I looked a bit into the problem and it actually seems very trivial to me. > Please re-check my observations. > > A contigmalloc(9) allocation doesn't get redzone protection, see kern_malloc.c. > But free(9) always does contigmalloc check. This makes deprecation of > contigfree(9) incompatible with redzone(9). And looks like > 19df0c5abcb9d4e951e610b6de98d4d8a00bd5f9 is our first bump into this sad fact. > > Added reviewers of d1bdc2821fcd416ab9b238580386eb605a6128d0 to Cc. Wow how did we run 8 months in main and stable/14 with this and another 100+ contigmallocs in base inckl. all wifi skbs for rtw88 and others, hyperv, iommu, vmm, busdma bounce code, qat, virtio, netmap, ... are these all (but skbuffs) alloc once and never really free again calls? I thought REDZONE uses a 0x42 pattern to guard and I am sure I do run debug kernels (main/GENERIC) for development. I should have hit that from day one. I ran 78 millions packet through the skbuff code using contigmalloc the other day. In addition to fixing, can someone explain why this didn't go kaboom? Ok, I found the answer: % grep -r DEBUG_REDZONE sys/*/conf/ sys/conf/std.debug sys/conf/NOTES sys/conf/NOTES:# DEBUG_REDZONE enables buffer underflows and buffer overflows detection for sys/conf/NOTES:options DEBUG_REDZONE I went and checked as I was sure it was in kernel configs before; but I see that was only mips (at least for 13.0 which was the oldest branch I had around). Sigh, my fault as probably no one ever tested this then as no one boots LINT kernels. And also explains why Trond said iwlwifi also wasn't happy. Thanks a lot for spotting this, I honestly didn't think about looking there anymore after exercising 8 months of conntigmalloc. But also means it is only a bug if someone turns on REDZONE and otherwise no problem at all. Lots of joy, Bjoern PS: it had another positive thing from drm-kmod hopefully as I discovered other things while debugging but that belongs elsewhere. -- Bjoern A. Zeeb r15:7