freebsd7 (and 8), radeon, xorg-server -> deadlock or so

Thu Feb 11 12:07:23 UTC 2010

On Thu, 2010-02-11 at 08:49 +0100, Ulrich Spörlein wrote:
> On Wed, 10.02.2010 at 12:08:12 -0600, Robert Noland wrote:
> > On Wed, 2010-02-10 at 19:00 +0100, Ulrich Spörlein wrote:
> > > On Wed, 10.02.2010 at 09:11:10 -0600, Robert Noland wrote:
> > > > I have a strong suspicion that the issue is with bus_dma.  If this is a
> > > > pci based card, then it is trying to allocate 32MB of contiguous
> > > > physical ram when the drm device is opened.  This usually succeeds the
> > > > first time that the driver opens the device, but later, after memory has
> > > > become fragmented, this can become an issue.  As I have mentioned, I
> > > > have code that reworks this whole process and I'll try and make a patch
> > > > available soon, but my I don't have a lot of time now, so it might be
> > > > the weekend before I can rebase the code and get a clean patch.
> > > 
> > > No deadlocks for me, but I've been hit by the 32MB issue. On 8-STABLE without
> > > the recent Xorg update (haven't done that yet) I usually startx right after
> > > boot, and this usually works fine.
> > > 
> > > One time I had massive ZFS/git jobs running headless first and wanted to
> > > startx afterwards. X11 took quite some time to come up and although
> > > window "switching" was snappy, *moving* windows around was slow as hell,
> > > window contents were re-drawing at ~1FPS.
> > > 
> > > This also seems to always happen if I stop X11 and startx it again.
> > > So I made a diff from a regular Xorg startup against the slow one:
> > > 
> > > --- Xorg.0.log  2010-02-09 20:59:16.000000000 +0100
> > > +++ Xorg.slow.log       2010-01-31 11:04:08.000000000 +0100
> > > ...
> > > @@ -599,49 +599,22 @@
> > >  (II) RADEON(0): [drm] added 1 reserved context for kernel
> > >  (II) RADEON(0): X context handle = 0x1
> > >  (II) RADEON(0): [drm] installed DRM signal handler
> > > -(II) RADEON(0): [pci] 32768 kB allocated with handle 0xed1a5000
> > > -(II) RADEON(0): [pci] ring handle = 0xed1a5000
> > > -(II) RADEON(0): [pci] Ring mapped at 0x802aa0000
> > > -(II) RADEON(0): [pci] Ring contents 0x00000000
> > > -(II) RADEON(0): [pci] ring read ptr handle = 0xed2a6000
> > > -(II) RADEON(0): [pci] Ring read ptr mapped at 0x8006d6000
> > > -(II) RADEON(0): [pci] Ring read ptr contents 0x00000000
> > > -(II) RADEON(0): [pci] vertex/indirect buffers handle = 0xed2a7000
> > > -(II) RADEON(0): [pci] Vertex/indirect buffers mapped at 0x812c00000
> > > -(II) RADEON(0): [pci] Vertex/indirect buffers contents 0x00000000
> > > -(II) RADEON(0): [pci] GART texture map handle = 0xed4a7000
> > > -(II) RADEON(0): [pci] GART Texture map mapped at 0x812ea7000
> > > -(II) RADEON(0): [drm] register handle = 0xfe8e0000
> > > -(II) RADEON(0): [dri] Visual configs initialized
> > > +(EE) RADEON(0): [pci] Out of memory (-12)
> > 
> > Yes, drm failed to allocate the 32MB to back the GART, so drm was
> > disabled.  Hopefully, the new allocation strategy will resolve this
> > since it will just require 32MB of physical ram below 4GB without
> > needing it to be contiguous.
> 
> Hmm, given that today, 32MB isn't really that much, wouldn't it make
> more sense to have radeon(4) reserve those 32MB on load and never let
> go? I have radeon_load=YES set in loader.conf so it has a fair chance to
> always get it's 32MB contig. memory during startup. Given ZFS' memory
> hunger, there may not be enough free physical RAM below 4GB ...

While that would make sense...  And it might work more like that once I
implement TTM/KMS (actually the whole memory requirements will change as
pages will then get mapped in and out of the gart), but currently, the
allocation of scatter gather memory to populate the gart is driven by
userland.  The only memory that is pre-allocated by the driver is the
sarea, and the register maps are pre-allocated, but that is just address
mapping.  For everything else, userland tells us when and how much
memory to allocate.  On radeon (and Intel for that matter) most if not
all cards can reference anything that the cpu can, (up to at least 36
bits, iirc, or maybe 40) so I might drop the 4GB limit.  However, since
all of this is done in the generic drm code, I don't actually know what
card I'm allocating memory for when I do it, so I won't change that part
until I need to.

I'll try and get a cleaned up patch of the scatter gather rework out
this weekend.  I've abandoned the use of bus_dma entirely for allocating
SG pages and interact with the VM directly, thus avoiding the contiguous
requirements of bus_dma.

robert.

> But it's your call, you obviously know more about this than me anyway :)
> 
> Bye,
> Uli
-- 
Robert Noland <rnoland at FreeBSD.org>
FreeBSD