[Bug 195097] New: x11/nvidia-driver: Kernel panic after "NVRM: rm_init_adapter() failed!"

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Mon Nov 17 10:01:17 UTC 2014


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195097

            Bug ID: 195097
           Summary: x11/nvidia-driver: Kernel panic after "NVRM:
                    rm_init_adapter() failed!"
           Product: Ports Tree
           Version: Latest
          Hardware: amd64
                OS: Any
            Status: Needs Triage
          Severity: Affects Some People
          Priority: ---
         Component: Individual Port(s)
          Assignee: danfe at FreeBSD.org
          Reporter: stefanf at FreeBSD.org
          Assignee: danfe at FreeBSD.org
             Flags: maintainer-feedback?(danfe at FreeBSD.org)

With the update to nvidia-driver-340.46 on HEAD amd64, I now have a ~50% chance
of a kernel panic at `startx'. Just before the panic I get the errors

NVRM: RmInitAdapter failed! (0x26:0x2a:1224)
nvidia0: NVRM: rm_init_adapter() failed!

followed immediately by

fatal trap 12: page fault while in kernel mode within rm_free_unused_clients.

I took a look at the open source parts of the driver and found an invalid null
pointer usage, I think.

The driver does roughly this:

devfs_open
    nvidia_dev_open
        devfs_set_cdevpriv
        nvidia_open_dev
            NV_UMA_ZONE_ALLOC_STACK(sc->api_sp);
            rm_init_adapter -> fail
            NV_UMA_ZONE_FREE_STACK(sc->api_sp);

Here sc->api_sp is set to NULL after the rm_init_adapter failure.

    devfs_clear_cdevpriv
        devfs_fpdrop
            devfs_destroy_cdevpriv
                nvidia_dev_dtor
                    nvidia_close_dev
                        rm_free_unused_clients(sc->api_sp)

Here rm_free_unused_clients is called with a null pointer. This function is not
open source, but from the panic my guess is it's not happy being called with a
null pointer.

I'm not sure about the best possible fix, but calling nvidia_close_dev after an
unsuccessful nvidia_open_dev seems wrong and also wraps the refcnt to 
(uint32_t)-1.  Maybe nvidia_dev_dtor simply needs to check the refcnt and avoid
calling nvidia_close_dev if it's already 0.

--- Comment #1 from Bugzilla Automation <bugzilla at FreeBSD.org> ---
Auto-assigned to maintainer danfe at FreeBSD.org

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-ports-bugs mailing list