[Fwd: Re: use of bus_dmamap_sync]

Tue Oct 25 19:39:21 PDT 2005

Apparently the original poster sent his question to me in private, then 
sent it again to the mailing list right as I was responding in private.
Anyways, no need to continue to guess; if anyone has any questions, feel
free to ask.

Below is my response.  Note that I edited it slightly to fix an error 
that I found

Scott

-------- Original Message --------
Subject: Re: use of bus_dmamap_sync
Date: Tue, 25 Oct 2005 07:59:03 -0600
From: Scott Long <scottl at samsco.org>
To: Dinesh Nair <dinesh at alphaque.com>
References: <435DD3B0.70605 at alphaque.com>

Dinesh Nair wrote:
> 
> hi scott,
> 
> i came across this message of yours,
> http://lists.freebsd.org/pipermail/freebsd-current/2004-December/044395.html 
> 
> 
> and you seem like the perfect person to assist me in something. i've been
> trying to figure out the best places to use bus_dmamap_sync when
> reading/writing to a dma mapped address space. however, i cant seem to get
> the gist of this, either from the mailing list discussions or the man page.
> could you assist me ?
> 
> i'm on FreeBSD 4.11 right now, and i notice the definitions of 
> BUS_DMASYNC_* has changed from an enum (0-3) in 4.x to a typedef in 5.x.
> 
> this is what i have done. i have used two buffers to handle reads from the
> device and writes to the device. the pseudocode is as follows
> 
> rx_func()
> {
>     POSITION A
       bus_dmamap_sync(tag, map, BUS_DMASYNC_PREREAD);
       Ask hardware for data
       bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTREAD);
> 
>     read from readbuf (i'm assuming that device has put data in
>                readbuf)
>     POSITION B
> }
> 
> tx_func()
> {
>     POSITION C
> 
>     write to txbuf (here's where we write to txbuf)
       bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE);
       notify hardware of the write
> 
>     POSITION D
       bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTWRITE);
> }
> 
> what BUS_DMASYNC_{PRE,POST}{READ,WRITE} option should i use  for 
> bus_dmamap_sync in position A, B, C and D ?
> 
> any assistance would be gladly appreciated, as i'm seeing some really weird
> symptoms on this device, where data written out is being immediately read
> in. i'm guessing this has to do with my wrong usage of bus_dmamap_sync().
> 

The point of the syncs is to do the proper memory barrier and cache
coherency magic between the CPU and the bus as well as do the memory
copies for bounce buffers.  If you are dealing with statically mapped
buffers, i.e. for an rx/tx descriptor ring, then you'll want code
exactly like described above.  In reality, most platforms only do stuff
for the POSTREAD and PREWRITE cases, but for the sake of completeness
the others are documented and usually used in drivers.  NetBSD might
have platforms that require operations for PREREAD and POSTWRITE, but
I've never looked that closely.

If you are dealing with dynamic buffers,
i.e. for mbuf data, then you'll want the PREREAD and PREWRITE ops to
happen in the callback function for bus_dmamap_load() and the POSTREAD
and POSTWRITE ops to happen right before calling bus_dmamap_unload.  So
in this case is would be:

rx_buf()
{
	allocate buffer
	allocate map
	bus_dmamap_load(tag, map, buffer, size, rx_callback, arg, flags)
}

rx_callback(arg, segs, nsegs, errno)
{
	convert segs to hardware format
	bus_dmamap_sync(tag, map, BUS_DMASYNC_PREREAD)
	notify hardware about buffer
}

rx_complete()
{
	bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTREAD)
	bus_dmamap_unload(tag, map, buffer)
	deallocate map
	process buffer
}

tx_buf()
{
	fill buffer
	allocate map
	bus_dmamap_load(tag, map, buffer, size, tx_callback, arg, flags)
}

tx_callback(arg, segs, nsegs, errno)
{
	convert segs to hardware format
	bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE)
	notify hardware about buffer
}

tx_complete()
	bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTWRITE)
	bus_dmamap_unload(tag, map, buffer)
	deallocate map
	free buffer
}

This is the design that busdma was originally modelled on.  It works
well for storage devices where the load operation must succeed.  It
doesn't work as well for network devices where the latency of the
indirect calls is measurable.  So for that, I added
bus_dmamap_load_mbuf_sg().  It eliminates the callback function and
returns the scatter gather list directly.  So, the above example would
be:

tx_buf()
{
	bus_dma_segment_t segs[maxsegs];
	int nsegs;

	fill buffer
	allocate map
	bus_dmamap_load_mbuf_sg(tag, map, buffer, size, &segs, &nsegs)
	convert segs to hardware format
	bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE)
	notify hardware about buffer
}

Also, the 'allocate map' part should be done carefully.  Most network
drivers are lazy and call bus_dmamap_create() and bus_dmamap_destroy()
for each buffer.  It's often better to pre-allocate the maps at init
time, put them on a list, and then just push and pop them off the list
at runtime.  This is usually faster than calling the busdma functions,
but you'll have to weigh the tradeoffs.

Scott