[Bug 276057] crash in xa_destroy() while initializing the i915kms module

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 01 Jan 2024 16:14:45 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276057

            Bug ID: 276057
           Summary: crash in xa_destroy() while initializing the i915kms
                    module
           Product: Base System
           Version: 14.0-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: donn@xmission.com

While trying to get Xorg graphics to work on my Alder Lake P laptop with
FreeBSD 14 RELEASE, I hit a problem where loading the i915kms module appeared
to cause the system to hang.  I searched for reports about this problem and
found a discussion at https://github.com/freebsd/drm-kmod/issues/252.  I did
eventually get graphics to work, but it required a small kernel change; I made
a comment (as @hirairaka5) in the github thread to let people know what I had
to do.

I'm reporting the kernel issue here.  There's a fairly obvious bug in the
xarray (re-)implementation in linuxkpi that causes a kernel segfault in i195kms
(from drm-515-kmod).  There are a number of ways to fix it; I picked a simple
one.  Here is my local git commit for the fix, with the details:

commit 31fc171e7ff77dfa01df39a0598b167fbd6a24bf (HEAD -> xarray-2)
Author: Donn Seeley <donn@xmission.com>
Date:   Mon Jan 1 08:56:08 2024 -0700

    xa_destroy: leave the xarray re-usable, like Linux does

    When trying to get graphics to work on an Alder Lake P laptop, I got a
    crash.  I copied down the information from ddb:

      --- trap 0xc, rip=0xffffffff80b302ec, rsp=0xfffffe013acf44e0 ...
      __mtx_lock_sleep() at __mtx_lock_sleep+0xbc
      xa_load() at xa_load+0x73
      guc_lrc_desc_pin() at guc_lrc_desc_pin+0x49
      intel_guc_submission_enable() at intel_guc_submission_enable+0x7d
      __uc_init_hw() at __uc_init_hw+0x46f
      intel_gt_init_hw() at intel_gt_init_hw+0x481
      intel_gt_resume() at intel_gt_resume+0x5cc
      intel_gt_init() at intel_gt_init+0x213
      i915_gem_init() at i915_gem_init+0x92
      i915_driver_probe() at i915_driver_probe+0x40
      i915_pci_probe() at i915_pci_probe+0x40
      linux_pci_attach_device() at linux_pci_attach_device+0x420
      device_attach() at device_attach+0x3be
      bus_generic_driver_added() at bus_generic_driver_added+0xa1
      [etc.]

    We got a segfault in __mtx_lock_sleep() when trying to dereference a
    null thread pointer.  The sequence of events looks like this:

      intel_guc_submission_enable(guc)
        guc_init_lrc_mapping(guc)
          xa_destroy(&guc->context_lookup)
            mtx_destroy(&xa->mtx)
              _mtx_destroy(&(m)->mtx_lock)
                m->mtx_lock = MTX_DESTROYED
          guc_kernel_context_pin(guc, ce)
            guc_lrc_desc_pin(guc, loop=true)
              lrc_desc_registered(guc, id)
                __get_context(guc, id)
                  xa_load(xa=&guc->context_lookup, index=id)
                    xa_lock(xa)
                      mtx_lock(&(xa)->mtx)
                        mtx_lock_flags((m), 0)
                          _mtx_lock_flags((m), (opts), (file), (line))
                            __mtx_lock_flags(&(m)->mtx_lock, o, f, l)
                              _mtx_obtain_lock_fetch(m, &v, tid)
                                atomic_fcmpset_acq_ptr(&(mp)->mtx_lock, vp,
(tid))
                              _mtx_lock_sleep(m, v, opts, file, line)
                                __mtx_lock_sleep(&(m)->mtx_lock, v)
                                  owner = lv_mtx_owner(v)
                                    ((struct thread *)((v) & ~MTX_FLAGMASK))
                                  TD_IS_RUNNING(owner)
                                    TD_GET_STATE(td) == TDS_RUNNING
                                      (td)->td_state

    We crash because v == MTX_DESTROYED, which means that td, the thread
    pointer part of the owner cookie, is NULL and we can't dereference it.

    The Linux version of xa_destroy() is careful to fix up the xarray so
    that it can be used again.  The FreeBSD version is simpler: we just need
    to maintain the xarray flags when we reinitialize the xarray structure.

diff --git a/sys/compat/linuxkpi/common/src/linux_xarray.c
b/sys/compat/linuxkpi/common/src/linux_xarray.c
index 44900666242f..856a67ba7e3c 100644
--- a/sys/compat/linuxkpi/common/src/linux_xarray.c
+++ b/sys/compat/linuxkpi/common/src/linux_xarray.c
@@ -352,6 +352,9 @@ xa_destroy(struct xarray *xa)
        radix_tree_for_each_slot(ppslot, &xa->root, &iter, 0)
                radix_tree_iter_delete(&xa->root, &iter, ppslot);
        mtx_destroy(&xa->mtx);
+
+       /* Linux leaves the xarray re-usable after xa_destroy().  */
+       xa_init_flags(xa, xa->flags);
 }

 /*

-- 
You are receiving this mail because:
You are the assignee for the bug.