7.2-PRERELEASE X-server hang in "drmwtq"

Robert Noland rnoland at FreeBSD.org
Sat Apr 25 20:37:43 UTC 2009


On Sun, 2009-04-26 at 00:18 +0400, Artem Kim wrote:
> On Saturday 25 April 2009 21:15:16 you wrote:
> > Ok, so my test is under gnome with metacity in composite mode.  Using
> > zsh (I think bash can do this also)
> >
> > balrog% for ((i=0 ; i < 5 ; i++ )) do firefox &;done
> >
> > So, I've launched 5 firefox and 10 xterms... Neither produce the hang.
> > Sitting in drmwtq means that you are waiting on the rendering engine to
> > catch up and send you an interrupt.  Probably the best debugging that we
> > are going to get is by:
> >
> > booting the system without starting X, kldload radeon and then set
> > sysctl hw.dri.0.debug=1 and start X/KDE... trigger the lockup and send
> > me the output of the debugging from /var/log/messages.
> >
> > robert.
> 
> I used the following script:
> 
> #!/bin/sh
> 
> TRY = 5
> 
> while [$ (TRY)-gt 0]; do
> 
> # Konqueror &
>   okteta &
>   kcalc &
>   kwrite &
> 
>   TRY = `expr $ (TRY) - 1`
> 
> done
> 
> sleep 30
> killall konqueror
> killall okteta
> killall kcalc
> killall kwrite
> 
> 
> If I set "hw.dri.0.debug = 1" the problem is not reproducing, even at very big 
> values of ${TRY}.
> However if hw.dri.0.debug = 0  one pass reproduces the problem.
> 
> 
> If I set "hw.dri.0.debug=1" _after_ the server hang, I see the message:
> 
> Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] pid = 782, cmd = 
> 0x80046457, nr = 0x57, dev 0xffffff0001556d00, auth = 1
> Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] returning 4

Ok, so what this is saying is that pid 782 is waiting on the rendering
engine to catch up.  The "returning 4" part says that we were
interrupted while we were waiting.  libdrm retries the wait, which
should return immediately if the engine has caught up now.  It never
appears to catch up, so either the counter is getting corrupted or we
failed to get the commands submitted to the card like we thought, or we
have locked up the GPU.

What does it take to recover from this?  Do you have to reboot, or is
killing the process that initiated the wait sufficient?

robert.

> Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] pid = 782, cmd = 
> 0x80046457, nr = 0x57, dev 0xffffff0001556d00, auth = 1
> Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] returning 4
> Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] pid = 782, cmd = 
> 0x80046457, nr = 0x57, dev 0xffffff0001556d00, auth = 1
> Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] returning 4
> Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] pid = 782, cmd = 
> 0x80046457, nr = 0x57, dev 0xffffff0001556d00, auth = 1
> Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] returning 4
> Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] pid = 782, cmd = 
> 0x80046457, nr = 0x57, dev 0xffffff0001556d00, auth = 1
> Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] returning 4
> 
> I try to apply this patch:
> http://people.freebsd.org/ ~ rnoland/drm_radeon-copyin-fix-try2.patch
> 
> In my case the problem remains.
> 
-- 
Robert Noland <rnoland at FreeBSD.org>
FreeBSD
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: This is a digitally signed message part
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20090425/92d7d1fb/attachment.pgp


More information about the freebsd-stable mailing list