busdma dflt_lock on amd64 > 4 GB

Jacques Caron jc at oxado.com
Tue Oct 25 15:10:10 PDT 2005


Hi all,

It seems there is a continuing story about bus_dma (or rather its use 
by drivers) and systems with more than 4 GB RAM. I submitted a pr for 
this issue:

http://www.freebsd.org/cgi/query-pr.cgi?pr=87977

I know it happens on amd64 machines, though after looking a bit 
further and trying to figure out the whole busdma thing, the issue 
might be more general (as busdma_machdep.c is exactly the same for 
i386 and amd64), but as it has been discussed around here a number of 
times and because there are probably more amd64 systems with more 
than 4 GB RAM than other types, I've selected this list, let me know 
if another list would be more suitable.

What I understand (please correct me if I'm wrong) is that:

- busdma will use bounce buffers when needed, and this includes the 
use of devices that are limited to 32-bit addressing (most of them, I 
would guess?) when there is more than 4 GB RAM

- I'm not 100% sure, but it seems bounce buffers are a limited 
ressource (that's at least what sysctl -a | grep busdma tells me, and 
that really looks like a bottleneck, btw)

- apparently busdma will defer the allocation of bounce buffers when 
there aren't enough available (and this can happen pretty quickly in 
some situations, though I haven't yet figured out the difference 
between the two zones): two simultaneous dd's from two disks with a 
large block size (bs=256000) will use up all available bounce buffer 
pages in zone1...

- if that happens, busdma_swi will eventually call the lockfunc 
associated with the dma tag, and panic if none is defined

Now, it seems that many drivers don't provide a lockfunc to 
bus_dma_tag_create. The commit log for the lockfunc addition says:

"The only time that NULL, NULL should ever be used is when the driver 
ensures that bus_dmamap_load() will not be deferred."

The problem is: what does this mean? How can a driver "ensure that 
bus_dmamap_load will not be deferred"? Calls to bus_dma_tag_create 
are not consistent in drivers:

- some drivers are apparently cautious: twe will either have 
BUS_DMA_ALLOCNOW and no lockfunc, or no flags and use 
busdma_lock_mutex and Giant. Is this the right approach?
- other drivers are *very* cautious: fxp will always use 
busdma_lock_mutex and Giant.
- other drivers don't care at all: bge and ata never provide a 
lockfunc, and in most cases don't use any flags either.

My (humble) opinion and a few questions:
- clarification of the cases when a lockfunc is required or not is 
needed. I fear it is always needed unless the created tag is only 
used as a "parent" for others, or (maybe?) if BUS_DMA_ALLOCNOW is set.

- an audit of bus_dma_tag_create calls in most drivers is needed, at 
least regarding lockfunc args (bge also has weird lowaddr/hiaddr, as 
has already been reported)

- maybe the dflt_lock should actually use the Giant mutex by default 
rather than panicking

- or maybe the lockfunc call in busdma_swi is not needed? I'm really 
not versed into kernelese, so I really have no idea

- is using Giant the best option, or should each driver use a 
different mutex, or...?

I will try a kernel with a modified ata driver with 
busdma_lock_mutex,&Giant where needed tomorrow and report back. I 
think that this will actually fix the issue, but I don't know if it 
might not cause other issues or degrade performance or if there is a 
better solution...

Any hints welcome,

Jacques.




More information about the freebsd-amd64 mailing list