svn commit: r260440 - head/sys/arm/conf
Nathan Whitehorn
nwhitehorn at freebsd.org
Wed Jan 8 15:50:12 UTC 2014
On 01/08/14 10:19, Ian Lepore wrote:
> On Tue, 2014-01-07 at 23:19 -0500, Nathan Whitehorn wrote:
>> On 01/07/14 22:40, Ian Lepore wrote:
>>> Author: ian
>>> Date: Wed Jan 8 03:40:18 2014
>>> New Revision: 260440
>>> URL: http://svnweb.freebsd.org/changeset/base/260440
>>>
>>> Log:
>>> Add option USB_HOST_ALIGN to configs that contain 'device usb'. Setting
>>> this to the cache line size is required to avoid data corruption on armv4
>>> and armv5, and improves performance on armv6, in both cases by avoiding
>>> partial cacheline flushes for USB IO.
>>>
>>> All these configs already exist in 10-stable. A few that don't (and
>>> thus can't be MFC'd yet) will be committed separately.
>>>
>> There has to be -- and I do not mean this as a criticism of your patch
>> -- a better solution to this problem than USB_HOST_ALIGN. Isn't busdma
>> supposed to handle this kind of thing? Why is USB different?
>> -Nathan
>>
> USB is different because it doesn't follow the busdma rules. It
> allocates one large buffer, then sub-divides it internally into bits
> that are used for DMA IO and adjacent bits that are accessed by the cpu
> concurrently with the DMA. If it doesn't do that subdividing with an
> awareness of the cache line boundaries, it ends up with concurrent CPU
> and DMA access to data in the same cache line, and there's no way a
> software-assisted cache coherency scheme can reliably do busdma sync ops
> that don't corrupt either the CPU data or the DMA data.
>
> On armv6 we now automatically bounce IO that's not sized and aligned on
> cache line boundaries. The overhead for doing so is non-trivial, doubly
> so in the case of USB, because it's the only consumer of busdma in the
> system that requires that the offset-within-page for a bounced IO be the
> same as the offset in the original page (so a pool of small bounce
> buffers for small unligned IOs is not an option, it must allocate full
> bounce pages for every IO).
>
> It used to be (on armv4) that when you used the busdma alloc functions
> to allocate small DMA buffers (a few bytes) the implementation allocated
> entire pages, which is pretty inefficient and can add up to a lot of
> allocation overhead. That was cited as a reason not to change USB's
> "allocate big then subdivide" scheme. I wrote new busdma allocators
> that use UMA pools to efficiently handle small aligned buffers of both
> normal and uncachable (BUSDMA_COHERENT) memory, so that's not a
> roadblock anymore. (Arm uses the new allocator, mips never got
> converted.)
>
> So, since we keep getting reports on arm@ of data corruption that shows
> up as 32-byte chunks of bad data, and it costs real time and resources
> to try to debug each case, I figured we should just go with the fix that
> nobody likes but it actually works.
>
> -- Ian
>
>
Thanks for the explanation, the debugging, and the fix. This seems like
a straightforward bug in the USB stack. Can it be fixed, or are there
architectural reasons why it is the way it is?
-Nathan
More information about the svn-src-all
mailing list