svn commit: r279236 - head/sys/netinet

Tue Feb 24 19:09:19 UTC 2015

On Tue, 2015-02-24 at 09:34 -0800, John-Mark Gurney wrote:
> Zbigniew Bodek wrote this message on Tue, Feb 24, 2015 at 12:57 +0000:
> > Author: zbb
> > Date: Tue Feb 24 12:57:03 2015
> > New Revision: 279236
> > URL: https://svnweb.freebsd.org/changeset/base/279236
> > 
> > Log:
> >   Change struct attribute to avoid aligned operations mismatch
> >   
> >   Previous __alignment(4) allowed compiler to assume that operations are
> >   performed on aligned region. On ARM processor, this led to alignment fault
> >   as shown below:
> >   trapframe: 0xda9e5b10
> >   FSR=00000001, FAR=a67b680e, spsr=60000113
> >   r0 =00000000, r1 =00000068, r2 =0000007c, r3 =00000000
> >   r4 =a67b6826, r5 =a67b680e, r6 =00000014, r7 =00000068
> >   r8 =00000068, r9 =da9e5bd0, r10=00000011, r11=da9e5c10
> >   r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc
> >   <...>
> >   udp_input+0x264: ldmia r5, {r0-r3, r6}
> >   udp_input+0x268: stmia r12, {r0-r3, r6}
> >   
> >   This was due to instructions which do not support unaligned access,
> >   whereas for __alignment(2) compiler replaced ldmia/stmia with some
> >   logically equivalent memcpy operations.
> >   In fact, the assumption that 'struct ip' is always 4-byte aligned
> >   is definitely false, as we have no impact on data alignment of packet
> >   stream received.
> 
> So, the whole point of ETHER_ALIGN is to make struct ip aligned on
> 4 byte offsets...  This will probably impact performance on arm for
> properly aligned struct ip...
> 

ETHER_ALIGN is wonderful... if you're on a platform that can DMA to an
arbitrary boundary.  Of course, if you're on such a platform it can
probably just access the word-sized values on halfword boundaries
anyway.  For arm, the only solution at the driver level is to memcpy()
every incoming packet to another buffer to realign it.  If you think
that makes receive performance really bad, you'd be right.

Many arm platforms can only DMA on a cacheline boundary.  The size of an
mbuf header is like 24 or 28 or something, definitely not cache aligned.
So in addition to the extra copying the drivers do for ETHER_ALIGN,
there could also be bounce-buffer copying involved due to the alignment
in the busdma tag.

The latter issue could be fixed with an MD padding field at the end of
the mbuf header to make the data portion start on a cache line boundary.
When I experimented with that concept on imx6 I gained 10 MB/sec
performance from the reduced copying.

-- Ian