bus_dmamap_sync() for bounced client buffers from user address space

Thu Apr 30 04:13:46 UTC 2015

On Wed, Apr 29, 2015 at 6:10 PM, Jason Harmening <jason.harmening at gmail.com>
wrote:

>
>> For what we call armv6 (which is mostly armv7)...
>>
>> The cache maintenance operations require virtual addresses, which means
>> it looks a lot like a VIPT cache.  Under the hood the implementation
>> behaves as if it were a PIPT cache so even in the presence of multiple
>> mappings of the same physical page into different virtual addresses, the
>> SMP coherency hardware works correctly.
>>
>> The ARM ARM says...
>>
>>         [Stuff about ARMv6 and page coloring when a cache way exceeds
>>         4K.]
>>
>>         ARMv7 does not support page coloring, and requires that all data
>>         and unified caches behave as Physically Indexed Physically
>>         Tagged (PIPT) caches.
>>
>> The only true armv6 chip we support isn't SMP and has a 16K/4-way cache
>> that neatly sidesteps the aliasing problem that requires page coloring
>> solutions.  So modern arm chips we get to act like we've got PIPT data
>> caches, but with the quirk that cache ops are initiated by virtual
>> address.
>>
>
> Cool, thanks for the explanation!
> To satisfy my own curiosity, since it "looks like VIPT", does that mean we
> still have to flush the cache on context switch?
>
>
>>
>> Basically, when you perform a cache maintainence operation, a
>> translation table walk is done on the core that issued the cache op,
>> then from that point on the physical address is used within the cache
>> hardware and that's what gets broadcast to the other cores by the snoop
>> control unit or cache coherency fabric (depending on the chip).
>
>
> So, if we go back to the original problem of wanting to do
> bus_dmamap_sync() on userspace buffers from some asynchronous context:
>
> Say the process that owns the buffer is running on one core and prefetches
> some data into a cacheline for the buffer, and bus_dmamap_sync(POSTREAD) is
> done by a kernel thread running on another core.  Since the core running
> the kernel thread is responsible for the TLB lookup to get the physical
> address, then since that core has no UVA the cache ops will be treated as
> misses and the cacheline on the core that owns the UVA won't be
> invalidated, correct?
>
> That means the panic on !pmap_dmap_iscurrent() in busdma_machdep-v6.c
> should stay?
>
> Sort of the same problem would apply to drivers using
> vm_fault_quick_hold_pages + bus_dmamap_load_ma...no cache maintenance,
> since there are no VAs to operate on.  Indeed, both arm and mips
> implementation of _bus_dmamap_load_phys don't do anything with the sync
> list.
>
It occurs to me that one way to deal with both the blocking-sfbuf for
physcopy and VIPT cache maintenance might be to have a reserved per-CPU KVA
page.  For arches that don't have a direct map, the idea would be to grab a
critical section, copy the bounce page or do cache maintenance on the
synclist entry, then drop the critical section.   That brought up a dim
memory I had of Linux doing something similar, and in fact it seems to use
kmap_atomic for both cache ops and bounce buffers.