Partial cacheline flush problems on ARM and MIPS

Fri Aug 24 03:56:25 UTC 2012

On Aug 23, 2012, at 5:45 PM, Ian Lepore wrote:

> On Thu, 2012-08-23 at 17:26 -0600, Warner Losh wrote:
>>> On Aug 23, 2012, at 3:28 PM, Ian Lepore wrote:
>>> Now we have a new type of constraint, I think of it as "granularity".
>>> In effect, we have a DMA system that can only do DMA in cacheline sized
>>> chunks.  Even when the IO size -- and thus the number of "bits on the
>>> wire" -- is less than the cacheline size, at the end of the DMA
>>> operation (which includes the software-assisted coherency operations)
>>> the number of bytes in memory that may be modified is the size of a
>>> cacheline.  This is because "the DMA system" is not just the engine that
>>> moves bytes around, it's the combination of hardware and software that
>>> work together to maintain cache coherency.
>> But this isn't new.  It is an alignment requirement, which carries
>> with it an implicit size requirement.  If you enforce the alignment,
>> and force all 'sub buffers' to have this alignment, you don't need the
>> new thing. 
> 
> So do you think it's safe to assume that any given dma tag that has an
> alignment constraint also implicitly has a buffer size constraint that
> the size must be a multiple of the alignment?

Yes.  If something must be aligned to N bits, chances are it doesn't decode the lower N bits which implies a size constraint.

> What if we have a platform with a 32-byte cacheline / DMA granularity,
> and then we have a builtin device on that SoC which can only do DMA on a
> 64K alignment (which its tag would reflect), but the hardware can move
> as little as 1 byte at a time?  Children of that bridge device come
> along and allocate little 16-byte buffers that eat 16 pages each.  It
> doesn't seem all that far-fetched to me.

This would be a very odd hardware.  DMA aligned to 64k that can only move one byte seems far fetched.  How useful would such a design be?  How would you do scatter gather on such a design?

But this isn't what I'm saying.  If the cache line size is 32, then for DMA we only ever give out chunks of 32 or larger.  In that case, the split cache line situation you gave as an example can't happen.