Call for testing and review, busdma changes

Sun Dec 9 20:34:55 UTC 2012

On Sun, 2012-12-09 at 08:48 -1000, Jeff Roberson wrote:
> 
> On Sun, 9 Dec 2012, Ian Lepore wrote:
> 
> > On Sat, 2012-12-08 at 08:51 -1000, Jeff Roberson wrote:
> >> Hello,
> >>
> >> http://people.freebsd.org/~jeff/physbio.diff
> >>
> >> I have a relative large patch that reforms the busdma API so that new
> >> types may be added without modifying every architecture's
> >> busdma_machdep.c.  It does this by unifying the bus_dmamap_load_buffer()
> >> routines so that they may be called from MI code.  The MD busdma is then
> >> given a chance to do any final processing in the complete() callback.
> >> This patch also contains cam changes to unify the bus_dmamap_load*
> >> handling in cam drivers.
> >>
> >> The arm and mips implementations saw the largest changes since they have
> >> to track virtual addresses for sync().  Previously this was done in a type
> >> specific way.  Now it is done in a generic way by recording the list of
> >> virtuals in the map.
> >>
> >> I have verified that this patch passes make universe which includes
> >> several kernel builds from each architecture.  I suspect that if I broke
> >> anything your machine simply won't boot or will throw I/O errors.  There
> >> is little subtlety, it is mostly refactoring.
> >>

First an FYI, my replies to this thread aren't making it to the mailing
lists, except when quoted by someone replying to them.  They're being
held for moderator approval because they're going to too many addresses.

> >
> > Testing on armv4/v5 platforms, the patches applied and compiled cleanly,
> > but fault-abort with a NULL deref on what appears to be the first call
> > to bus_dmamap_load().  I'll dig deeper into debugging that after
> > browsing the new code so I can figure out where to start debugging.
> 
> Can you give me the file and line of the fault?
> 

Better than that, I can give you patches (attached).  There were two
little glitches... the allocated slist wasn't getting attached to the
map, and when building the slist in _bus_dmamap_load_buffer() the slist
needs to participate in the coalescing logic near the end of the loop. 

I'm not positive the attached quick-fix is perfect, but it allows one of
my arm units here to boot and run and pass some basic IO smoke tests.
I'll be trying it on other arm platforms later today.

> >
> > Just from an initial quick browsing of the new code for armv4, I notice
> > these things...
> >
> > The new routine __bus_dmamap_mayblock() does (*flags) |= BUS_DMA_WAITOK;
> > which is a no-op because BUS_DMA_WAITOK is zero.  I'm not sure what the
> > intent was, but I think any modification of the WAIT-related flags would
> > be conceptually wrong because it overrides the caller's instructions on
> > whether or not to do a deferred load when resources aren't available.
> 
> The intent of the original load function seems to be that it is always 
> blocking.  Most architectures force the flag.  The mbuf and uio load 
> routines don't support blocking at all because the callback lacks the 
> information needed to restart the operation.
> 

The manpage states that bus_dmamap_load() never blocks for any reason.
The mbuf and uio flavors say they are variations of bus_dmapmap_load();
I take the implication of that to mean that they also are defined as
never blocking.

The docs then say that WAITOK vs. NOWAIT control the scheduling of a
deferred load, and that NOWAIT is forced in the mbuf and uio cases.
Your refactored code already handles that by adding NOWAIT to the
caller-specified flags for mbufs and uio.

So the code will do the right thing now (with your patches), but only
because (*flags) |= BUS_DMA_WAITOK is a no-op.

> >
> > The way you've refactored the mbuf-related load routines, the MD code is
> > no longer able to tell that it's an mbuf operation.  This comes back to
> > what I said a few days ago... when you look at the current code on
> > various platforms you see a lot of false similarities.  The code isn't
> > nearly identical because it really all needs to be the same, it's
> > because it was all copied from the same source, and it doesn't currently
> > work correctly on arm and mips.  With your changes, the ability to
> > correct the implementation on those platforms is now gone.  Perhaps this
> > can be fixed easily by defining some new flags that the MI routines can
> > OR-in to give the MD routines more info on what's happening. (Correcting
> > the mbuf sync handling on arm and mips requires that we know at sync
> > time that the buffers involved are mbufs, which means we need to know at
> > map-load time so we can set a flag or a type code or something in the
> > map that we can examine at sync time.)
> 
> Why do you need to know it is an mbuf if you have a record of the virtual 
> addresses?  The only type specific information was used to find the 
> addresses.  See how arm's busdma_machdep-v6 ws implemented.
> 

Because there is a special rule for mbufs on VIVT-cache architectures:
the address of the data buffer is not aligned to a cacheline boundary,
but the sync code is allowed to treat the buffer as if it is aligned and
thus avoid invoking the partial cacheline flush algorithm.  That
algorithm was shown a few months ago to be fatally flawed, and we're on
a path (albeit in super-slow-motion) to eliminate it.

I have patches to fix the mbuf partial sync and other things related to
partial sync problems, and my next step will be to try to rework them to
fit in with what you've done.  Based on what I've seen so far in your
refactoring, I think just passing a flag to say it's an mbuf, from the
MI to the MD mapping routine, will provide enough info.

> >
> >> The next step is to allow for dma loading of physical addresses.  This
> >> will permit unmapped I/O.  Which is a significant performance optimization
> >> targeted for 10.0.
> >
> > This sounds scary for arm and mips, or more specifically for VIVT cache
> > platforms that can only do a sync based on virtual addresses.  Can you
> > talk some more about the plans for this?
> 
> It will be for addresses which were never mapped into kva.  To support 
> unmaapped io.  There will only be a need for bounce pages, no syncing.
> 

Can the pages be mapped into any non-kernel vmspaces?  If they aren't
mapped into any vmspace at all, then I think there'll be no implications
for VIVT cache architectures.  If they appear in any pmap at all then
there may be problems.

-- Ian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: physbio_armv4fix.diff
Type: text/x-patch
Size: 1661 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-sparc64/attachments/20121209/31e513cb/attachment.bin>