Call for testing and review, busdma changes

Mon Dec 24 00:22:57 UTC 2012

On Sun, 9 Dec 2012, Ian Lepore wrote:

> On Sun, 2012-12-09 at 08:48 -1000, Jeff Roberson wrote:
>>
>> On Sun, 9 Dec 2012, Ian Lepore wrote:
>>
>>> On Sat, 2012-12-08 at 08:51 -1000, Jeff Roberson wrote:
>>>> Hello,
>>>>
>>>> http://people.freebsd.org/~jeff/physbio.diff
>>>>
>>>> I have a relative large patch that reforms the busdma API so that new
>>>> types may be added without modifying every architecture's
>>>> busdma_machdep.c.  It does this by unifying the bus_dmamap_load_buffer()
>>>> routines so that they may be called from MI code.  The MD busdma is then
>>>> given a chance to do any final processing in the complete() callback.
>>>> This patch also contains cam changes to unify the bus_dmamap_load*
>>>> handling in cam drivers.
>>>>
>>>> The arm and mips implementations saw the largest changes since they have
>>>> to track virtual addresses for sync().  Previously this was done in a type
>>>> specific way.  Now it is done in a generic way by recording the list of
>>>> virtuals in the map.
>>>>
>>>> I have verified that this patch passes make universe which includes
>>>> several kernel builds from each architecture.  I suspect that if I broke
>>>> anything your machine simply won't boot or will throw I/O errors.  There
>>>> is little subtlety, it is mostly refactoring.
>>>>
>
> First an FYI, my replies to this thread aren't making it to the mailing
> lists, except when quoted by someone replying to them.  They're being
> held for moderator approval because they're going to too many addresses.
>
>>>
>>> Testing on armv4/v5 platforms, the patches applied and compiled cleanly,
>>> but fault-abort with a NULL deref on what appears to be the first call
>>> to bus_dmamap_load().  I'll dig deeper into debugging that after
>>> browsing the new code so I can figure out where to start debugging.
>>
>> Can you give me the file and line of the fault?
>>
>
> Better than that, I can give you patches (attached).  There were two
> little glitches... the allocated slist wasn't getting attached to the
> map, and when building the slist in _bus_dmamap_load_buffer() the slist
> needs to participate in the coalescing logic near the end of the loop.

I understand the first problem.  The second, I'm not sure about.  When 
we're going through bounce pages we use virtual to virtual.  Why do you 
need to flush the cache lines for the source addresses?  The device will 
only do io to the bounce pages so they are the only that need other 
synchronization.

>
> I'm not positive the attached quick-fix is perfect, but it allows one of
> my arm units here to boot and run and pass some basic IO smoke tests.
> I'll be trying it on other arm platforms later today.
>
>>>
>>> Just from an initial quick browsing of the new code for armv4, I notice
>>> these things...
>>>
>>> The new routine __bus_dmamap_mayblock() does (*flags) |= BUS_DMA_WAITOK;
>>> which is a no-op because BUS_DMA_WAITOK is zero.  I'm not sure what the
>>> intent was, but I think any modification of the WAIT-related flags would
>>> be conceptually wrong because it overrides the caller's instructions on
>>> whether or not to do a deferred load when resources aren't available.
>>
>> The intent of the original load function seems to be that it is always
>> blocking.  Most architectures force the flag.  The mbuf and uio load
>> routines don't support blocking at all because the callback lacks the
>> information needed to restart the operation.
>>
>
> The manpage states that bus_dmamap_load() never blocks for any reason.
> The mbuf and uio flavors say they are variations of bus_dmapmap_load();
> I take the implication of that to mean that they also are defined as
> never blocking.
>
> The docs then say that WAITOK vs. NOWAIT control the scheduling of a
> deferred load, and that NOWAIT is forced in the mbuf and uio cases.
> Your refactored code already handles that by adding NOWAIT to the
> caller-specified flags for mbufs and uio.
>
> So the code will do the right thing now (with your patches), but only
> because (*flags) |= BUS_DMA_WAITOK is a no-op.

This code is problematic.  We can only support WAITOK operations for 
bus_dmamap_load() because the callbacks from the swi when bounce pages are 
available can only remember one pointer.  To fix this for real we'd have 
to remember the original type and call the appropriate load function 
again.  This could be solved by having a { type, pointer, length } 
structure that is passed between the front and back-end.  This would solve 
your mbuf problem as well.  I think I will do this next.

By the way I have an mbuf branch that pushes data and mbuf structure into 
separate cachelines.  It's not only faster on arm and the like but on x86 
as well.  So someone should pursue this and you'd have way fewer partial 
line flushes.

>
>>>
>>> The way you've refactored the mbuf-related load routines, the MD code is
>>> no longer able to tell that it's an mbuf operation.  This comes back to
>>> what I said a few days ago... when you look at the current code on
>>> various platforms you see a lot of false similarities.  The code isn't
>>> nearly identical because it really all needs to be the same, it's
>>> because it was all copied from the same source, and it doesn't currently
>>> work correctly on arm and mips.  With your changes, the ability to
>>> correct the implementation on those platforms is now gone.  Perhaps this
>>> can be fixed easily by defining some new flags that the MI routines can
>>> OR-in to give the MD routines more info on what's happening. (Correcting
>>> the mbuf sync handling on arm and mips requires that we know at sync
>>> time that the buffers involved are mbufs, which means we need to know at
>>> map-load time so we can set a flag or a type code or something in the
>>> map that we can examine at sync time.)
>>
>> Why do you need to know it is an mbuf if you have a record of the virtual
>> addresses?  The only type specific information was used to find the
>> addresses.  See how arm's busdma_machdep-v6 ws implemented.
>>
>
> Because there is a special rule for mbufs on VIVT-cache architectures:
> the address of the data buffer is not aligned to a cacheline boundary,
> but the sync code is allowed to treat the buffer as if it is aligned and
> thus avoid invoking the partial cacheline flush algorithm.  That
> algorithm was shown a few months ago to be fatally flawed, and we're on
> a path (albeit in super-slow-motion) to eliminate it.

You can skip the sync because you know the information in the mbuf header 
is not necessary?

>
> I have patches to fix the mbuf partial sync and other things related to
> partial sync problems, and my next step will be to try to rework them to
> fit in with what you've done.  Based on what I've seen so far in your
> refactoring, I think just passing a flag to say it's an mbuf, from the
> MI to the MD mapping routine, will provide enough info.

I think I described a better approach above that will solve mutliple 
problems.  Let me know what you think.

Jeff

>
>>>
>>>> The next step is to allow for dma loading of physical addresses.  This
>>>> will permit unmapped I/O.  Which is a significant performance optimization
>>>> targeted for 10.0.
>>>
>>> This sounds scary for arm and mips, or more specifically for VIVT cache
>>> platforms that can only do a sync based on virtual addresses.  Can you
>>> talk some more about the plans for this?
>>
>> It will be for addresses which were never mapped into kva.  To support
>> unmaapped io.  There will only be a need for bounce pages, no syncing.
>>
>
> Can the pages be mapped into any non-kernel vmspaces?  If they aren't
> mapped into any vmspace at all, then I think there'll be no implications
> for VIVT cache architectures.  If they appear in any pmap at all then
> there may be problems.
>
> -- Ian
>
>