bus_dmamap_sync() for bounced client buffers from user address space

Sat Apr 25 14:01:04 UTC 2015

On 04/25/15 04:41, Konstantin Belousov wrote:
> On Fri, Apr 24, 2015 at 05:50:15PM -0500, Jason Harmening wrote:
>> A couple of comments:
>>
>> --POSTWRITE and POSTREAD are only asynchronous if you call them from an
>> asynchronous context.
>> For a driver that's already performing DMA operations on usermode memory,
>> it seems likely that there's going to be *some* place where you can call
>> bus_dmamap_sync() and be guaranteed to be executing in the context of the
>> process that owns the memory.  Then a regular bcopy will be safe and
>> inexpensive, assuming the pages have been properly vslock-ed/vm_map_wire-d.
>> That's usually whatever read/write/ioctl operation spawned the DMA transfer
>> in the first place.  So, in those cases can you not just move the
>> POSTREAD/POSTWRITE sync from the "DMA-finished" interrupt to the
>> d_read/d_write/d_ioctl that waits on the "DMA-finished" interrupt?
>>
>> --physcopyin/physcopyout aren't trivial.  They go through uiomove_fromphys,
>> which often uses sfbufs to create temporary KVA mappings for the physical
>> pages.  sf_buf_alloc() can sleep (unless SFB_NOWAIT is specified, which
>> means it can fail and which uiomove_fromphys does not specify for good
>> reason); that makes it unsafe for use in either a threaded interrupt or a
>> filter.  Perhaps the physcopyout path could be changed to use pmap_qenter
>> directly in this case, but that can still be expensive in terms of TLB
>> shootdowns.
>>
>> Checking against VM_MIN_KERNEL_ADDRESS seems sketchy; it eliminates the
>> chance to use a much-less-expensive bcopy in cases where the sync is
>> happening in correct process context.
>>
>> Context-switching during bus_dmamap_sync() shouldn't be an issue.  In a
>> filter interrupt, curproc will be completely arbitrary but none of this
>> stuff should be called in a filter anyway.  Otherwise, if you're in a
>> kernel thread (including an ithread), curproc should be whatever proc was
>> supplied when the thread was created.  That's usually proc0, which only has
>> kernel address space.  IOW, even if a context-switch happens sometime
>> during bus_dmamap_sync, any pmap check or copy should have a consistent and
>> non-arbitrary process context.
>>
>> I think something like your second solution would be workable to make
>> UIO_USERSPACE syncs work in non-interrupt kernel threads, but given all the
>> restrictions and extra cost of physcopy, I'm not sure how useful that would
>> be.
>>
>> I do think busdma.9 could at least use a note that bus_dmamap_sync() is
>> only safe to call in the context of the owning process for user buffers.
> UIO_USERSPACE for busdma is absolutely unsafe and cannot be used without
> making kernel panicing.  Even if you wire the userspace bufer, there is
> nothing which could prevent other thread in the user process, or other
> process sharing the same address space, to call munmap(2) on the range.
>
> The only safe method to work with the userspace regions is to
> vm_fault_quick_hold() them to get hold on the pages, and then either
> pass pages array down, or remap them in the KVA with pmap_qenter().
I was under the impression that any attempt to free or munmap the
virtual range would block waiting for the underlying pages to be unwired.
That's certainly the behavior I've seen with free; is it not the case
with munmap?  Either way, I haven't walked the code to see where the
blocking is implemented.

If UIO_USERSPACE truly is unsafe to use with busdma, then we need to
either make it safe or stop documenting it in the manpage.
Perhaps the bounce buffer logic could use copyin/copyout for userspace,
if in the right process?  Or just always use physcopy for non-KVA as in
the first suggestion?

It seems like in general it is too hard for drivers using busdma to deal
with usermode memory in a way that's both safe and efficient:
--bus_dmamap_load_uio + UIO_USERSPACE is apparently really unsafe
--if they do things the other way and allocate in the kernel, then then
they better either be willing to do extra copying, or create and
refcount their own vm_objects and use d_mmap_single (I still haven't
seen a good example of that), or leak a bunch of memory (if they use
d_mmap), because the old device pager is also really unsafe.

>
>>
>> On Fri, Apr 24, 2015 at 8:13 AM, Svatopluk Kraus <onwahe at gmail.com> wrote:
>>
>>> DMA can be done on client buffer from user address space. For example,
>>> thru bus_dmamap_load_uio() when uio->uio_segflg is UIO_USERSPACE. Such
>>> client buffer can bounce and then, it must be copied to and from
>>> bounce buffer in bus_dmamap_sync().
>>>
>>> Current implementations (in all archs) do not take into account that
>>> bus_dmamap_sync() is asynchronous for POSTWRITE and POSTREAD in
>>> general. It can be asynchronous for PREWRITE and PREREAD too. For
>>> example, in some driver implementations where DMA client buffers
>>> operations are buffered. In those cases, simple bcopy() is not
>>> correct.
>>>
>>> Demonstration of current implementation (x86) is the following:
>>>
>>> -----------------------------
>>> struct bounce_page {
>>>     vm_offset_t vaddr;      /* kva of bounce buffer */
>>>     bus_addr_t  busaddr;    /* Physical address */
>>>     vm_offset_t datavaddr;  /* kva of client data */
>>>     bus_addr_t  dataaddr;   /* client physical address */
>>>     bus_size_t  datacount;  /* client data count */
>>>     STAILQ_ENTRY(bounce_page) links;
>>> };
>>>
>>>
>>> if ((op & BUS_DMASYNC_PREWRITE) != 0) {
>>>     while (bpage != NULL) {
>>>         if (bpage->datavaddr != 0) {
>>>             bcopy((void *)bpage->datavaddr,
>>>                 (void *)bpage->vaddr,
>>>                 bpage->datacount);
>>>         } else {
>>>             physcopyout(bpage->dataaddr,
>>>                 (void *)bpage->vaddr,
>>>                 bpage->datacount);
>>>         }
>>>         bpage = STAILQ_NEXT(bpage, links);
>>>     }
>>>     dmat->bounce_zone->total_bounced++;
>>> }
>>> -----------------------------
>>>
>>> There are two things:
>>>
>>> (1) datavaddr is not always kva of client data, but sometimes it can
>>> be uva of client data.
>>> (2) bcopy() can be used only if datavaddr is kva or when map->pmap is
>>> current pmap.
>>>
>>> Note that there is an implication for bus_dmamap_load_uio() with
>>> uio->uio_segflg set to UIO_USERSPACE that used physical pages are
>>> in-core and wired. See "man bus_dma".
>>>
>>> There is not public interface to check that map->pmap is current pmap.
>>> So one solution is the following:
>>>
>>> if (bpage->datavaddr >= VM_MIN_KERNEL_ADDRESS) {
>>>         bcopy();
>>> } else {
>>>         physcopy();
>>> }
>>>
>>> If there will be public pmap_is_current() then another solution is the
>>> following:
>>>
>>> if (bpage->datavaddr != 0) && pmap_is_current(map->pmap)) {
>>>         bcopy();
>>> } else {
>>>         physcopy();
>>> }
>>>
>>> The second solution implies that context switch must not happen during
>>> bus_dmamap_sync() called from an interrupt routine. However, IMO, it's
>>> granted.
>>>
>>> Note that map->pmap should be always kernel_pmap for datavaddr >=
>>> VM_MIN_KERNEL_ADDRESS.
>>>
>>> Comments, different solutions, or objections?
>>>
>>> Svatopluk Kraus
>>> _______________________________________________
>>> freebsd-arch at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>>>
>> _______________________________________________
>> freebsd-arch at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 603 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20150425/f2ce359f/attachment.sig>