nvidia_drv.so/Xorg crashes
- Reply: Fernando_ApesteguĂa : "Re: nvidia_drv.so/Xorg crashes"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 25 Jun 2021 02:30:52 UTC
I have four (12.2-RELEASE) systems between the office at home that are
full or part time FreeBSD desktops. All have pny nvidia quadro 410's.
These have been mostly working well for about 6 years.
For months I've started seeing screen corruption when using chrome or
kicad; firefox and thunderbird are always ok. But just starting eeschema
always damages the root window a little. And it's common when running
chrome/kicad to see lines in the console xterm window jump up and down
two lines. But for the last week or two Xorg has been crashing:
[ 74574.029] (EE) Backtrace:
[ 74574.032] (EE) 0: /usr/local/bin/Xorg (?+0x0) [0x41c98a]
[ 74574.033] (EE) unw_get_proc_name failed: no unwind info found [-10]
[ 74574.033] (EE) 1: /lib/libthr.so.3 (?+0x0) [0x800929b7e]
[ 74574.035] (EE) unw_get_proc_name failed: no unwind info found [-10]
[ 74574.035] (EE) 2: /lib/libthr.so.3 (?+0x0) [0x80092913f]
[ 74574.037] (EE) 3: ? (?+0x0) [0x7ffffffff003]
[ 74574.038] (EE) 4:
/usr/local/lib/xorg/modules/drivers/nvidia_drv.so (?+0x0) [0x801cc8c20]
[ 74574.038] (EE)
[ 74574.038] (EE) Segmentation fault at address 0x50
[ 74574.038] (EE)
Fatal server error:
[ 74574.038] (EE) Caught signal 11 (Segmentation fault). Server
aborting
The crashes are always preceded by at least one nvidia "Xid" kernel message:
Jun 23 ... kernel: : NVRM: Xid (PCI:0000:05:00): 69, pid=6327,
Class Error: ChId 0009, Class 0000902d, Offset 000008b4, Data fffffffb,
ErrorCode 00000004
Jun 23 ... kernel: : NVRM: Xid (PCI:0000:05:00): 69, pid=6327,
Class Error: ChId 0009, Class 0000902d, Offset 000008b4, Data fffffffb,
ErrorCode 00000004
Jun 23 ... kernel: : NVRM: Xid (PCI:0000:05:00): 69, pid=6327,
Class Error: ChId 0009, Class 0000902d, Offset 000008b4, Data ffffffb9,
ErrorCode 00000004
Jun 23 ... kernel: : pid 6327 (Xorg), jid 0, uid 0: exited on signal 6
Worth noting is that it was not unusual to see many Xid ErrorCode 4
kernel messages without crashes. (And it's the only ErrorCode I've ever
seen.)
My first thought was bad nvidia-driver version. But after working my
way, one by one, down to 460.39 (circa February 2021 -- months before
the first crashes) I gave up on that theory.
My next guess bad hardware but I swapped quadro's between two systems
and the crashes persisted.
Yesterday Xorg crashed often enough for me to zero on the trigger; it's
the use of tvtwm's f.forcemove action (which is like f.move but allows
moving a windows off the screen) if I move a window slightly off the
bottom of the screen. Here's the .twmrc binding I use:
Button2 = m s : window : f.forcemove
The crash doesn't happen 100% of the time but it's pretty easy to
trigger with half a dozen windows open. Just grab a window and randomly
dip part of it past the bottom of the screen. So my new theory is a
frame buffer operation in one of the libraries the path between Xorg and
the nvidia driver has regressed and is asking the nvidia driver to do
something that causes it to do something bad.
I run a custom version of tvtwm but was able to easily crash Xorg using
x11-wm/twm on a spare quadro 410 workstation; the key is f.forcemove.
Does anybody know what this issue is? What are likely candidates of
recently changed port libraries that I could try downgrading? Should I
try opening a ticket with nvidia? Should I try even older 460.XX
drivers? What else can I try? (Thanks for reading this far!)
Craig