Date: Wed, 16 Jun 2021 21:05:03 UTC
On Tue, Jun 15, 2021 at 08:36:07PM -0700, Neel Chauhan wrote: > Hi current@, > > First off, sorry if I spammed developers@ and other mailing lists with > my previous message, and to bz@/hselasky@/manu@ sent so many duplicate > emails. > > Right now, I am attempting to update the drm-kmod driver to the Linux > 5.7 code, and am having an issue with the pmap lock. I am new-ish to the > kernel, meaning not a whole lot of "experience", but do have patches in > src. > > But like it not we need kernel newbies, they're the next generation of > experts. If we don't, we'd be the next Minix with **zero** development > since Tanenbaum retired. > > Going back, the code in question is here: > https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/gem/i915_gem_mman.c#L346 > > The lines important are 346-356, but lines of interest are also the > non-"#ifdef __linux__" sections of vm_fault_cpu(). > > The code gives this error: panic: Assertion > vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455 > > I have attached the core dump log. > > To those who aren't graphics driver experts, it happens when I load Xorg > when Xorg attempts to map the I/O to userspace. But I feel this is more > of me not using page locks correctly (which is needed for the pmap), or > maybe a linuxkpi issue, rather than a graphics-specific issue. > > I spent days on this (all my non-$DAYJOB hours at one point + all my > weekends) and haven't figured out the locks completely. Does anyone have > suggestions to what I'm doing wrong in my code and locks? > > If it is important, OpenBSD's version of this code is here: > https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/sys/dev/pci/drm/i915/gem/i915_gem_mman.c#L459 > (lines 459-523, but some calls are unsurprisingly different). > The function in question appears to implement a device page fault handler. In FreeBSD, such handlers are responsible only for ensuring that the requested page(s) are present in the VM object backing the mapping that was faulted on. The generic fault handler in sys/vm/vm_fault.c is responsible for actually updating the faulting process' page tables by calling pmap_enter(). In other words, our fault handler interface is quite different from OpenBSD's and their example should not be followed exactly. Adding a vm_object_busy() call in the loop will silence the assertion I guess but the handler is still wrong. If you look further down at vm_fault_gtt() (and in earlier versions of the DRM drivers, i915_gem_fault()), the remap_io_mapping() implementation in the LinuxKPI does basically what I'm describing. Something similar is required for vm_fault_cpu(), though I don't quite understand when vm_fault_cpu() is supposed to be used.