Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

From: Mark Johnston <markj_at_freebsd.org>
Date: Wed, 16 Jun 2021 17:05:03 -0400
On Tue, Jun 15, 2021 at 08:36:07PM -0700, Neel Chauhan wrote:
> Hi current_at_,
> 
> First off, sorry if I spammed developers_at_ and other mailing lists with 
> my previous message, and to bz_at_/hselasky_at_/manu_at_ sent so many duplicate 
> emails.
> 
> Right now, I am attempting to update the drm-kmod driver to the Linux 
> 5.7 code, and am having an issue with the pmap lock. I am new-ish to the 
> kernel, meaning not a whole lot of "experience", but do have patches in 
> src.
> 
> But like it not we need kernel newbies, they're the next generation of 
> experts. If we don't, we'd be the next Minix with **zero** development 
> since Tanenbaum retired.
> 
> Going back, the code in question is here: 
> https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/gem/i915_gem_mman.c#L346
> 
> The lines important are 346-356, but lines of interest are also the 
> non-"#ifdef __linux__" sections of vm_fault_cpu().
> 
> The code gives this error: panic: Assertion 
> vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
> 
> I have attached the core dump log.
> 
> To those who aren't graphics driver experts, it happens when I load Xorg 
> when Xorg attempts to map the I/O to userspace. But I feel this is more 
> of me not using page locks correctly (which is needed for the pmap), or 
> maybe a linuxkpi issue, rather than a graphics-specific issue.
> 
> I spent days on this (all my non-$DAYJOB hours at one point + all my 
> weekends) and haven't figured out the locks completely. Does anyone have 
> suggestions to what I'm doing wrong in my code and locks?
> 
> If it is important, OpenBSD's version of this code is here: 
> https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/sys/dev/pci/drm/i915/gem/i915_gem_mman.c#L459 
> (lines 459-523, but some calls are unsurprisingly different).
> 

The function in question appears to implement a device page fault
handler.  In FreeBSD, such handlers are responsible only for ensuring
that the requested page(s) are present in the VM object backing the
mapping that was faulted on.  The generic fault handler in
sys/vm/vm_fault.c is responsible for actually updating the faulting
process' page tables by calling pmap_enter().  In other words, our fault
handler interface is quite different from OpenBSD's and their example
should not be followed exactly.  Adding a vm_object_busy() call in the
loop will silence the assertion I guess but the handler is still wrong.

If you look further down at vm_fault_gtt() (and in earlier versions of
the DRM drivers, i915_gem_fault()), the remap_io_mapping()
implementation in the LinuxKPI does basically what I'm describing.
Something similar is required for vm_fault_cpu(), though I don't quite
understand when vm_fault_cpu() is supposed to be used.
Received on Wed Jun 16 2021 - 21:05:03 UTC

Original text of this message