Hard system lockups with 10.1, probably drm/newcons/radeonkms-related

Roger Leigh rleigh at codelibre.net
Fri Dec 12 19:34:55 UTC 2014


Hi folks,

With 10.1-RELEASE, I've enabled newcons at boot with
  kern.vty="vt"
in loader.conf.  With the latest Xorg/drm installed with pkg, I'm
seeing intermittent hangs and hard lockups of the system.  I've
included the logs for one which recovered earlier today, but later
on it just locked up completely and I don't have logs for that
since I had to do a hard reset.  I had to install and enable
hal+dbus to get a working keyboard and mouse when running X,
despite both working fine on the console!

Not sure what the trigger is.  Possibly also related to input.
The first hard hang was after logging in with "mwm" via kdm4.
It didn't start mwm, so I ran "mwm&" in the xterm; it locked up when
I clicked and dragged the window title, i.e. when initiating the drag
event.
The second hang was while typing into a tmux session inside a
konsole window.  Nothing particularly special happening at the
moment it locked up.

I'm happy to do further debugging, but given that it locks up the
whole system, I'm not sure how to go about getting any useful
information at that point.

The graphics card is an AMD Radeon HD 6800 Series using
/dev/dri/card0.  Starting X11 automatically loads the needed
modules:

# kldstat 
Id Refs Address            Size     Name
 1   59 0xffffffff80200000 1755658  kernel
 2    1 0xffffffff81956000 267f48   zfs.ko
 3    2 0xffffffff81bbe000 6780     opensolaris.ko
 4    1 0xffffffff81c11000 2b58     uhid.ko
 5    1 0xffffffff81c14000 357f     ums.ko
 6    2 0xffffffff81c18000 28c0     vboxnetflt.ko
 7    2 0xffffffff81c1b000 b998     netgraph.ko
 8    2 0xffffffff81c27000 434c0    vboxdrv.ko
 9    1 0xffffffff81c6b000 40a7     ng_ether.ko
10    1 0xffffffff81c70000 3ec0     vboxnetadp.ko
11    1 0xffffffff81c74000 11a57a   radeonkms.ko
12    1 0xffffffff81d8f000 47f80    drm2.ko
13    4 0xffffffff81dd7000 1ff2     iicbus.ko
14    1 0xffffffff81dd9000 1a46     iic.ko
15    1 0xffffffff81ddb000 1e48     iicbb.ko
16    1 0xffffffff81ddd000 18f3     radeonkmsfw_BARTS_pfp.ko
17    1 0xffffffff81ddf000 1ce8     radeonkmsfw_BARTS_me.ko
18    1 0xffffffff81de1000 136f     radeonkmsfw_BTC_rlc.ko
19    1 0xffffffff81de3000 6585     radeonkmsfw_BARTS_mc.ko


Kernel log for the recoverable hang:

Dec 12 13:23:23 sorilea kernel: drmn0: error: GPU lockup CP stall for more than 10000m
sec
Dec 12 13:23:23 sorilea kernel: drmn0: warning: GPU lockup (waiting for 0x000000000008
7184 last fence id 0x0000000000087177)
Dec 12 13:23:23 sorilea kernel: drmn0: info: Saved 407 dwords of commands on ring 0.
Dec 12 13:23:23 sorilea kernel: drmn0: info: GPU softreset: 0x00000003
Dec 12 13:23:23 sorilea kernel: drmn0: info:   GRBM_STATUS               = 0xA0003828
Dec 12 13:23:23 sorilea kernel: drmn0: info:   GRBM_STATUS_SE0           = 0x00000007
Dec 12 13:23:23 sorilea kernel: drmn0: info:   GRBM_STATUS_SE1           = 0x00000007
Dec 12 13:23:23 sorilea kernel: drmn0: info:   SRBM_STATUS               = 0x200000C0
Dec 12 13:23:23 sorilea kernel: drmn0: info:   R_008674_CP_STALLED_STAT1 = 0x00000000
Dec 12 13:23:23 sorilea kernel: drmn0: info:   R_008678_CP_STALLED_STAT2 = 0x00010100
Dec 12 13:23:23 sorilea kernel: drmn0: info:   R_00867C_CP_BUSY_STAT     = 0x00020182
Dec 12 13:23:23 sorilea kernel: drmn0: info:   R_008680_CP_STAT          = 0x80038243
Dec 12 13:23:23 sorilea kernel: drmn0: info:   GRBM_SOFT_RESET=0x00007F6B
Dec 12 13:23:23 sorilea kernel: drmn0: info:   GRBM_STATUS               = 0x00003828
Dec 12 13:23:23 sorilea kernel: drmn0: info:   GRBM_STATUS_SE0           = 0x00000007
Dec 12 13:23:23 sorilea kernel: drmn0: info:   GRBM_STATUS_SE1           = 0x00000007
Dec 12 13:23:23 sorilea kernel: drmn0: info:   SRBM_STATUS               = 0x200000C0
Dec 12 13:23:23 sorilea kernel: drmn0: info:   R_008674_CP_STALLED_STAT1 = 0x00000000
Dec 12 13:23:23 sorilea kernel: drmn0: info:   R_008678_CP_STALLED_STAT2 = 0x00000000
Dec 12 13:23:23 sorilea kernel: drmn0: info:   R_00867C_CP_BUSY_STAT     = 0x00000000
Dec 12 13:23:23 sorilea kernel: drmn0: info:   R_008680_CP_STAT          = 0x00000000
Dec 12 13:23:23 sorilea kernel: drmn0: info: GPU reset succeeded, trying to resume
Dec 12 13:23:23 sorilea kernel: info: [drm] probing gen 2 caps for device 1002:5a16 = 
2/0
Dec 12 13:23:23 sorilea kernel: info: [drm] enabling PCIE gen 2 link speeds, disable w
ith radeon.pcie_gen2=0
Dec 12 13:23:23 sorilea kernel: info: [drm] PCIE GART of 512M enabled (table at 0x0000
000000040000).
Dec 12 13:23:23 sorilea kernel: drmn0: info: WB enabled
Dec 12 13:23:23 sorilea kernel: drmn0: info: fence driver on ring 0 use gpu addr 0x000
0000040000c00 and cpu addr 0x0xfffff8007e940c00
Dec 12 13:23:23 sorilea kernel: drmn0: info: fence driver on ring 3 use gpu addr 0x000
0000040000c0c and cpu addr 0x0xfffff8007e940c0c
Dec 12 13:23:23 sorilea kernel: info: [drm] ring test on 0 succeeded in 4 usecs
Dec 12 13:23:23 sorilea kernel: info: [drm] ring test on 3 succeeded in 2 usecs
Dec 12 13:23:33 sorilea kernel: drmn0: error: GPU lockup CP stall for more than 10000m
sec
Dec 12 13:23:33 sorilea kernel: drmn0: warning: GPU lockup (waiting for 0x000000000008
7185 last fence id 0x0000000000087177)
Dec 12 13:23:33 sorilea kernel: error: [drm:pid939:r600_ib_test] *ERROR* radeon: fence
 wait failed (-11).
Dec 12 13:23:33 sorilea kernel: error: [drm:pid939:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-11).
Dec 12 13:23:33 sorilea kernel: drmn0: error: ib ring test failed (-11).
Dec 12 13:23:33 sorilea kernel: drmn0: info: GPU softreset: 0x00000003
Dec 12 13:23:33 sorilea kernel: drmn0: info:   GRBM_STATUS               = 0xA0003828
Dec 12 13:23:33 sorilea kernel: drmn0: info:   GRBM_STATUS_SE0           = 0x00000007
Dec 12 13:23:33 sorilea kernel: drmn0: info:   GRBM_STATUS_SE1           = 0x00000007
Dec 12 13:23:33 sorilea kernel: drmn0: info:   SRBM_STATUS               = 0x200000C0
Dec 12 13:23:33 sorilea kernel: drmn0: info:   R_008674_CP_STALLED_STAT1 = 0x00000000
Dec 12 13:23:33 sorilea kernel: drmn0: info:   R_008678_CP_STALLED_STAT2 = 0x00004100
Dec 12 13:23:33 sorilea kernel: drmn0: info:   R_00867C_CP_BUSY_STAT     = 0x00020182
Dec 12 13:23:33 sorilea kernel: drmn0: info:   R_008680_CP_STAT          = 0x80028243
Dec 12 13:23:33 sorilea kernel: drmn0: info:   GRBM_SOFT_RESET=0x00007F6B
Dec 12 13:23:33 sorilea kernel: drmn0: info:   GRBM_STATUS               = 0x00003828
Dec 12 13:23:33 sorilea kernel: drmn0: info:   GRBM_STATUS_SE0           = 0x00000007
Dec 12 13:23:33 sorilea kernel: drmn0: info:   GRBM_STATUS_SE1           = 0x00000007
Dec 12 13:23:33 sorilea kernel: drmn0: info:   SRBM_STATUS               = 0x200000C0
Dec 12 13:23:33 sorilea kernel: drmn0: info:   R_008674_CP_STALLED_STAT1 = 0x00000000
Dec 12 13:23:33 sorilea kernel: drmn0: info:   R_008678_CP_STALLED_STAT2 = 0x00000000
Dec 12 13:23:33 sorilea kernel: drmn0: info:   R_00867C_CP_BUSY_STAT     = 0x00000000
Dec 12 13:23:33 sorilea kernel: drmn0: info:   R_008680_CP_STAT          = 0x00000000
Dec 12 13:23:33 sorilea kernel: drmn0: info: GPU reset succeeded, trying to resume
Dec 12 13:23:33 sorilea kernel: info: [drm] probing gen 2 caps for device 1002:5a16 = 2/0
Dec 12 13:23:33 sorilea kernel: info: [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
Dec 12 13:23:33 sorilea kernel: info: [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Dec 12 13:23:33 sorilea kernel: drmn0: info: WB enabled
Dec 12 13:23:33 sorilea kernel: drmn0: info: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x0xfffff8007e940c00
Dec 12 13:23:33 sorilea kernel: drmn0: info: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x0xfffff8007e940c0c
Dec 12 13:23:33 sorilea kernel: info: [drm] ring test on 0 succeeded in 4 usecs
Dec 12 13:23:33 sorilea kernel: info: [drm] ring test on 3 succeeded in 2 usecs
Dec 12 13:23:33 sorilea kernel: info: [drm] ib test on ring 0 succeeded in 0 usecs
Dec 12 13:23:33 sorilea kernel: info: [drm] ib test on ring 3 succeeded in 1 usecs

It worked perfectly for 5 hours after this recovery.



Thanks all,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux    http://people.debian.org/~rleigh/
 `. `'   schroot and sbuild  http://alioth.debian.org/projects/buildd-tools
   `-    GPG Public Key      F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Xorg.0.log.old.xz
Type: application/octet-stream
Size: 8636 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20141212/062fcba5/attachment.obj>


More information about the freebsd-stable mailing list