svn commit: r279236 - head/sys/netinet

Tue Feb 24 19:26:57 UTC 2015

Ian Lepore wrote this message on Tue, Feb 24, 2015 at 11:59 -0700:
> On Tue, 2015-02-24 at 09:34 -0800, John-Mark Gurney wrote:
> > Zbigniew Bodek wrote this message on Tue, Feb 24, 2015 at 12:57 +0000:
> > > Author: zbb
> > > Date: Tue Feb 24 12:57:03 2015
> > > New Revision: 279236
> > > URL: https://svnweb.freebsd.org/changeset/base/279236
> > > 
> > > Log:
> > >   Change struct attribute to avoid aligned operations mismatch
> > >   
> > >   Previous __alignment(4) allowed compiler to assume that operations are
> > >   performed on aligned region. On ARM processor, this led to alignment fault
> > >   as shown below:
> > >   trapframe: 0xda9e5b10
> > >   FSR=00000001, FAR=a67b680e, spsr=60000113
> > >   r0 =00000000, r1 =00000068, r2 =0000007c, r3 =00000000
> > >   r4 =a67b6826, r5 =a67b680e, r6 =00000014, r7 =00000068
> > >   r8 =00000068, r9 =da9e5bd0, r10=00000011, r11=da9e5c10
> > >   r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc
> > >   <...>
> > >   udp_input+0x264: ldmia r5, {r0-r3, r6}
> > >   udp_input+0x268: stmia r12, {r0-r3, r6}
> > >   
> > >   This was due to instructions which do not support unaligned access,
> > >   whereas for __alignment(2) compiler replaced ldmia/stmia with some
> > >   logically equivalent memcpy operations.
> > >   In fact, the assumption that 'struct ip' is always 4-byte aligned
> > >   is definitely false, as we have no impact on data alignment of packet
> > >   stream received.
> > 
> > So, the whole point of ETHER_ALIGN is to make struct ip aligned on
> > 4 byte offsets...  This will probably impact performance on arm for
> > properly aligned struct ip...
> > 
> 
> ETHER_ALIGN is wonderful... if you're on a platform that can DMA to an
> arbitrary boundary.  Of course, if you're on such a platform it can
> probably just access the word-sized values on halfword boundaries
> anyway.  For arm, the only solution at the driver level is to memcpy()
> every incoming packet to another buffer to realign it.  If you think
> that makes receive performance really bad, you'd be right.

Having working on the TS-7200 which had this restriction, I know very
well the issues...

> Many arm platforms can only DMA on a cacheline boundary.  The size of an

Are you sure?  Last time I looked at the various drivers, about a year
or two ago, I think I found only one ARM driver that didn't do at
least 2 byte DMA alignment..

Either that, or they haven't been declaring their restrictions properly
in bus_dma_tag_create...

> mbuf header is like 24 or 28 or something, definitely not cache aligned.
> So in addition to the extra copying the drivers do for ETHER_ALIGN,
> there could also be bounce-buffer copying involved due to the alignment
> in the busdma tag.

Can you list a few of these drivers for me?  I'd be curious to look
at them again...

> The latter issue could be fixed with an MD padding field at the end of
> the mbuf header to make the data portion start on a cache line boundary.
> When I experimented with that concept on imx6 I gained 10 MB/sec
> performance from the reduced copying.

ffec allows byte aligned rx dma, but it looks like 16 byte TX align...
If the ffec allows segments that are not a multiple of 16, there is/was
talk about supporting this, and then you'd only have to bounce a few
bytes instead of the entire packet, but I'm not sure if we can do that
w/ the existing bus_dma framework..

I can't find the define/var right now, but I remeber there being
an offset that is processed to try to put the packet at the correct
byte offset...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."