bus_dmamap_sync() for bounced client buffers from user address space

Sun Apr 26 18:02:45 UTC 2015

On 04/25/15 15:14, Konstantin Belousov wrote:
> On Sat, Apr 25, 2015 at 01:47:07PM -0500, Jason Harmening wrote:
>> On 04/25/15 13:18, Konstantin Belousov wrote:
>>> On Sat, Apr 25, 2015 at 12:55:13PM -0500, Jason Harmening wrote:
>>>> Ah, that looks much better.  A few things though:
>>>> 1) _bus_dmamap_load_ma (note the underscore) is still part of the MI/MD
>>>> interface, which we tell drivers not to use.  It looks like it's
>>>> implemented for every arch though.  Should there be a public and
>>>> documented bus_dmamap_load_ma ?
>>> Might be yes.  But at least one consumer of the KPI must appear before
>>> the facility is introduced.
>> Could some of the GART/GTT code consume that?
> Do you mean, by GEM/GTT code ?  Indeed, this is interesting and probably
> workable suggestion.  I thought that I would need to provide a special
> interface from DMAR for the GEM, but your proposal seems to fit.  Still,
> an issue is that the Linux code is structured significantly different,
> and this code, although isolated, is significant divergent from the
> upstream.

Yes, GEM/GTT.  I know it would be useful for i915, maybe other drm2
drivers too.

>
>>>> 3) Using bus_dmamap_load_ma would mean always using physcopy for bounce
>>>> buffers...seems like the sfbufs would slow things down ?
>>> For amd64, sfbufs are nop, due to the direct map.  But, I doubt that
>>> we can combine bounce buffers and performance in the single sentence.
>> In fact the amd64 implementation of uiomove_fromphys doesn't use sfbufs
>> at all thanks to the direct map.  sparc64 seems to avoid sfbufs as much
>> as possible too.  I don't know what arm64/aarch64 will be able to use. 
>> Those seem like the platforms where bounce buffering would be the most
>> likely, along with i386 + PAE.  They might still be used on 32-bit
>> platforms for alignment or devices with < 32-bit address width, but then
>> those are likely to be old and slow anyway.
>>
>> I'm still a bit worried about the slowness of waiting for an sfbuf if
>> one is needed, but in practice that might not be a big issue.
>>
I noticed the following in vm_map_delete, which is called by sys_munmap:

 2956                  * Wait for wiring or unwiring of an entry to complete.
 2957                  * Also wait for any system wirings to disappear on
 2958                  * user maps.
 2959                  */
 2960                 if ((entry->eflags & MAP_ENTRY_IN_TRANSITION) != 0 ||
 2961                     (vm_map_pmap(map) != kernel_pmap &&
 2962                     vm_map_entry_system_wired_count(entry) != 0)) {
...
 2970                         (void) vm_map_unlock_and_wait(map, 0);

It looks like munmap does wait on wired pages (well, system-wired pages, not mlock'ed pages).
The system-wire count on the PTE will be non-zero if vslock/vm_map_wire(...VM_MAP_WIRE_SYSTEM...) was called on it.
Does that mean UIO_USERSPACE dmamaps are actually safe from getting the UVA taken out from under them?
Obviously it doesn't make bcopy safe to do in the wrong process context, but that seems easily fixable.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 603 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20150426/9787f755/attachment.sig>