CURRENT crashes with nvidia GPU BLOB : vm_radix_insert: key 23c078 is already present
Gary Jennejohn
gljennjohn at googlemail.com
Sat Aug 10 08:37:11 UTC 2013
On Fri, 9 Aug 2013 10:12:37 -0700
David Wolfskill <david at catwhisker.org> wrote:
> On Fri, Aug 09, 2013 at 07:32:51AM +0200, O. Hartmann wrote:
> > ...
> > > > On 8 August 2013 11:10, O. Hartmann <ohartman at zedat.fu-berlin.de>
> > > > wrote:
> > > > > The most recent CURRENT doesn't work with the x11/nvidia-driver
> > > > > (which is at 319.25 in the ports and 325.15 from nVidia).
> > > > >
> > > > > After build- and installworld AND successfully rebuilding port
> > > > > x11/nvidia-driver, the system crashes immediately after a reboot
> > > > > as soon the kernel module nvidia.ko seems to get loaded (in my
> > > > > case, I load nvidia.ko via /etc/rc.conf.local since the nVidia
> > > > > BLOB doesn't load cleanly everytime when loaded
> > > > > from /boot/loader.conf).
> > > > >
> > > > > The crash occurs on systems with default compilation options set
> > > > > while building world and with settings like -O3 -march=native. It
> > > > > doesn't matter.
> > > > >
> > > > > FreeBSD and the port x11/nvidia-driver has been compiled with
> > > > > CLANG.
> > > > >
> > > > > Most recent FreeBSD revision still crashing is r254097.
> > > > >
> > > > > When vmcore is saved, I always see something like
> > > > >
> > > > > savecore: reboot after panic: vm_radix_insert: key 23c078 is
> > > > > already present
> > > > >
> > > > >
> > > > > Does anyone has any idea what's going on?
> > > > >
> > > > > Thanks for helping in advance,
> > > > >
> > > > > Oliver
> > >
> > > I'm seeing a complete deadlock on my T520 with today's current and
> > > latest portsnap'd versions of ports for the nvidia-driver updates.
> > >
> > > A little bisection and help from others seems to point the finger at
> > > Jeff's r254025
> > >
> > > I'm getting a complete deadlock on X starting, but loading the module
> > > seems to have no ill effects.
> > >
> > > Sean
> >
> > Rigth, I loaded the module also via /boot/loader.conf and it loads
> > cleanly. I start xdm and then the deadlock occurs.
> >
> > I tried recompiling the whole xorg suite via "portmaster -f xorg xdm",
> > it took a while, but no effect, still dying.
> > .....
>
> Sorry to be rather late to the party; the Internet connection I'm using
> at the moment is a bit flaky. (I'm out of town.)
>
> I managed to get head/i386 @r254135 built and booting ... by removing
> the "options DEBUG_MEMGUARD" from my kernel.
>
> However, that merely prevented a (very!) early panic, and got me to the
> point where trying to start xdm with the x11/nvidia-driver as the
> display driver causes an immediate reboot (no crash dump, despite
> 'dumpdev="AUTO"' in /etc/rc.conf). No drop to debugger, either.
>
> Booting & starting xdm with the nv driver works -- that's my present
> environment as I am typing this.
>
> However, the panic with DEBUG_MEMGUARD may offer a clue. Unfortunately,
> it's early enough that screen lock/scrolling doesn't work, and I only
> had the patience to write down partof the panic information. (This is
> on my laptop; no serial console, AFAICT -- and no device to capture the
> output if I did, since I'm not at home.)
>
> The top line of the screen (at the panic) reads:
>
> s/kern/subr_vmem.c:1050
>
> The backtrace has the expected stuff near the top (about kbd, panic, and
> memguard stuff); just below that is:
>
> vmem_alloc(c1226100,6681000,2,c1820cc0,3b5,...) at 0xc0ac5673=vmem_alloc+0x53/frame 0xc1820ca0
>
> Caveat: that was hand-transcribed from the screen to papaer, then
> hand-transcribed from paper to this email message. And my highest grade
> in "Penmanship" was a D+.
>
> Be that as it may, here's the relevant section of subr_vmem.c with line
> numbers (cut/pasted, so tabs get munged):
>
> 1039 /*
> 1040 * vmem_alloc: allocate resource from the arena.
> 1041 */
> 1042 int
> 1043 vmem_alloc(vmem_t *vm, vmem_size_t size, int flags, vmem_addr_t *addrp)
> 1044 {
> 1045 const int strat __unused = flags & VMEM_FITMASK;
> 1046 qcache_t *qc;
> 1047
> 1048 flags &= VMEM_FLAGS;
> 1049 MPASS(size > 0);
> 1050 MPASS(strat == M_BESTFIT || strat == M_FIRSTFIT);
> 1051 if ((flags & M_NOWAIT) == 0)
> 1052 WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "vmem_alloc");
> 1053
> 1054 if (size <= vm->vm_qcache_max) {
> 1055 qc = &vm->vm_qcache[(size - 1) >> vm->vm_quantum_shift];
> 1056 *addrp = (vmem_addr_t)uma_zalloc(qc->qc_cache, flags);
> 1057 if (*addrp == 0)
> 1058 return (ENOMEM);
> 1059 return (0);
> 1060 }
> 1061
> 1062 return vmem_xalloc(vm, size, 0, 0, 0, VMEM_ADDR_MIN, VMEM_ADDR_MAX,
> 1063 flags, addrp);
> 1064 }
>
>
> This is at r254025.
>
The REINPLACE_CMD at line 160 of nvidia-driver/Makefile is incorrect.
How do I know that? Because I made a patch which results in a working
nvidia-driver-319.32 with r254050. That's what I'm running right now.
Here's the patch (loaded with :r in vi, so all spaces etc. are correct):
--- src/nvidia_subr.c.orig 2013-08-09 11:32:26.000000000 +0200
+++ src/nvidia_subr.c 2013-08-09 11:33:23.000000000 +0200
@@ -945,7 +945,7 @@
return ENOMEM;
}
- address = kmem_alloc_contig(kernel_map, size, flags, 0,
+ address = kmem_alloc_contig(kmem_arena, size, flags, 0,
sc->dma_mask, PAGE_SIZE, 0, attr);
if (!address) {
status = ENOMEM;
@@ -994,7 +994,7 @@
os_flush_cpu_cache();
if (at->pte_array[0].virtual_address != NULL) {
- kmem_free(kernel_map,
+ kmem_free(kmem_arena,
at->pte_array[0].virtual_address, at->size);
malloc_type_freed(M_NVIDIA, at->size);
}
@@ -1021,7 +1021,7 @@
if (at->attr != VM_MEMATTR_WRITE_BACK)
os_flush_cpu_cache();
- kmem_free(kernel_map, at->pte_array[0].virtual_address,
+ kmem_free(kmem_arena, at->pte_array[0].virtual_address,
at->size);
malloc_type_freed(M_NVIDIA, at->size);
@@ -1085,7 +1085,7 @@
}
for (i = 0; i < count; i++) {
- address = kmem_alloc_contig(kernel_map, PAGE_SIZE, flags, 0,
+ address = kmem_alloc_contig(kmem_arena, PAGE_SIZE, flags, 0,
sc->dma_mask, PAGE_SIZE, 0, attr);
if (!address) {
status = ENOMEM;
@@ -1139,7 +1139,7 @@
for (i = 0; i < count; i++) {
if (at->pte_array[i].virtual_address == 0)
break;
- kmem_free(kernel_map,
+ kmem_free(kmem_arena,
at->pte_array[i].virtual_address, PAGE_SIZE);
malloc_type_freed(M_NVIDIA, PAGE_SIZE);
}
@@ -1169,7 +1169,7 @@
os_flush_cpu_cache();
for (i = 0; i < count; i++) {
- kmem_free(kernel_map,
+ kmem_free(kmem_arena,
at->pte_array[i].virtual_address, PAGE_SIZE);
malloc_type_freed(M_NVIDIA, PAGE_SIZE);
}
The primary differences are
1) use kmem_arena instead of kernel_map everywhere. The REINPLACE_CMD
uses kernel_arena
2) DO NOT use kva_free, but kmem_free as previously
To use the patch
Delete or comment out the 4 lines starting at 160 in Makefile
Run ``make patch''
cd work/NVIDIA-FreeBSD-x86_64-319.32/src
patch < [wherever the patch is]
cd ../../..
make deinstall install clean
kldunload the old nvidia.ko
kldload the new nvidia.ko
start X
--
Gary Jennejohn
More information about the freebsd-current
mailing list