busdma dflt_lock on amd64 > 4 GB

Scott Long scottl at samsco.org
Wed Oct 26 08:48:21 PDT 2005

Søren Schmidt wrote:

> On 26/10/2005, at 16:09, Scott Long wrote:
>> Jacques Caron wrote:
>>> Hi all,
>>> Continuing on this story... [I took the liberty of CC'ing Scott  and 
>>> Soren], pr is amd64/87977 though it finally isn't amd64- specific 
>>> but  >4GB-specific.
>>> There is really a big problem somewhere between ata and bus_dma  for 
>>> boxes with more than 4 GB RAM and more than 2 ata disks:
>>> * bounce buffers will be needed
>>> * ata will have bus_dma allocate bounce buffers:
>>> hw.busdma.zone1.total_bpages: 32
>>> hw.busdma.zone1.free_bpages: 32
>>> hw.busdma.zone1.reserved_bpages: 0
>>> hw.busdma.zone1.active_bpages: 0
>>> hw.busdma.zone1.total_bounced: 27718
>>> hw.busdma.zone1.total_deferred: 0
>>> hw.busdma.zone1.lowaddr: 0xffffffff
>>> hw.busdma.zone1.alignment: 2
>>> hw.busdma.zone1.boundary: 65536
>>> * if I do a dd with a bs=256000, 16 bounce pages will be used  (most 
>>> of the time). As long as I stay on the same disk, no more  pages will 
>>> be used.
>>> * as soon as I access another disk (e.g. with another dd with the  
>>> same bs=256000), another set of 16 pages will be used (bus_dma  tags 
>>> and maps are allocated on a per-channel basis), and all 32  bounce 
>>> pages will be used (most of the time)
>>> * and if I try to access a third disk, more bounce pages are  needed 
>>> and:
>>> - one of ata_dmaalloc calls to bus_dma_tag_create has ALLOCNOW set
>>> - busdma_machdep will not allocate more bounce pages in that case  
>>> (the limit is imposed by maxsize in that situation, which has  
>>> already been reached)
>>> - ata_dmaalloc will fail
>>> - but some other bus_dma_tag_create call without ALLOCNOW set will  
>>> still cause bounce pages to be allocated, but deferred, and the  
>>> non-existent lockfunc to be called, and panic.
>>> Adding the standard lockfunc will (probably) solve the panic  issue, 
>>> but there will still be a problem with DMA in ata.
>> Actually, it won't.  It'll result in silent data corruption.  What is
>> happening is that bus_dmamap_load() is returning EINPROGRESS, but the
>> ATA driver ignores it and assumes that the load failed.  Later on the
>> busdma subsystem tries to run the callback but trips over the  
>> intentional assertion.  If the standard lock was used, then the  callback
>> would succeed and start spamming memory that either had been freed or
>> is in the process of being used by other ATA commands.
> Ehm, according to the man page the load should succed for at least  one 
> map when the ALLOCNOW flag is set. ATA only use one map so there  is no 
> way that spamming can happen.
> The bug i ATA is that the sg_tag and the work_tag is not created with  
> the ALLOCNOW flag so if all resources are used before they are called  
> things get messy. The below patch takes care of that problem.

All ALLOCNOW does is guarantee that there are enough bounce pages for 
one consumer of the bounce zone to succeed.  Bounce pages are pooled 
into zones that correspond to similar  attributes.  If you create a
bunch of tags that have similar attributes, then they'll map to the
same zone and you won't necessarily get more pages pre-allocated to
that zone.  Not well documented, I know.  The justification for this
behaviour is to prevent excessive amounts of RAM from being reserved
by careless drivers.  The pools can grow when maps are created, though.

>> So, the panic is doing exactly what it is supposed to do.  It's  guarding
>> against bugs in the driver.  The workaround for this is to use the  
>> NOWAIT flag in all instances of bus_dmamap_load() where deferals can
>> happen.  This, however, means that using bounce pages still remains  
>> fragile and that the driver is still likely to return ENOMEM to the  
>> upper layers.  C'est la vie, I guess.  At one time I had patches that
>> made ATA use the busdma API correctly (it is one of the few remaining
>> that does not), but they rotted over time.
> As long as ATA doesn't do tags there is no gain by changing this at  all 
> except spamming the code with all the callback crap thats not  needed.
> According to the man page bus_dmamap_load takes no flags, so thats  why 
> thats not done. Besides its not needed as shown above.

Ah, you're right, it's not documented.  I'll fix that.  FYI, if you 
specify BUS_DMA_NOWAIT as the flag, it'll return ENOMEM if bounce pages
are required but not immediately available.  It's been this way for 
years, but never documented.  The perils of porting a manpage from
NetBSD without doing a single bit of sanity checking or understanding on 
it =-(

As for the callbacks, you're already using them.  They are there to make
driver more robust in the face of resource shortages.  8-10 years ago it
was about dealing with the 16MB limit of ISA, now it's about dealing 
with the 4GB limit of 32-bit devices.  Re-arranging the code to use it
correctly is not hard, and I've published a number of pieces on how to
do it.  There are also ample examples in the source tree, with the only
incorrect ones being those that are no longer popular enough to warrant
the work.  The API has been around since FreeBSD 3.0, so it's nothing new.


> Søren Schmidt
> sos at FreeBSD.org

More information about the freebsd-amd64 mailing list