svn commit: r279236 - head/sys/netinet
John-Mark Gurney
jmg at funkthat.com
Tue Feb 24 19:26:57 UTC 2015
Ian Lepore wrote this message on Tue, Feb 24, 2015 at 11:59 -0700:
> On Tue, 2015-02-24 at 09:34 -0800, John-Mark Gurney wrote:
> > Zbigniew Bodek wrote this message on Tue, Feb 24, 2015 at 12:57 +0000:
> > > Author: zbb
> > > Date: Tue Feb 24 12:57:03 2015
> > > New Revision: 279236
> > > URL: https://svnweb.freebsd.org/changeset/base/279236
> > >
> > > Log:
> > > Change struct attribute to avoid aligned operations mismatch
> > >
> > > Previous __alignment(4) allowed compiler to assume that operations are
> > > performed on aligned region. On ARM processor, this led to alignment fault
> > > as shown below:
> > > trapframe: 0xda9e5b10
> > > FSR=00000001, FAR=a67b680e, spsr=60000113
> > > r0 =00000000, r1 =00000068, r2 =0000007c, r3 =00000000
> > > r4 =a67b6826, r5 =a67b680e, r6 =00000014, r7 =00000068
> > > r8 =00000068, r9 =da9e5bd0, r10=00000011, r11=da9e5c10
> > > r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc
> > > <...>
> > > udp_input+0x264: ldmia r5, {r0-r3, r6}
> > > udp_input+0x268: stmia r12, {r0-r3, r6}
> > >
> > > This was due to instructions which do not support unaligned access,
> > > whereas for __alignment(2) compiler replaced ldmia/stmia with some
> > > logically equivalent memcpy operations.
> > > In fact, the assumption that 'struct ip' is always 4-byte aligned
> > > is definitely false, as we have no impact on data alignment of packet
> > > stream received.
> >
> > So, the whole point of ETHER_ALIGN is to make struct ip aligned on
> > 4 byte offsets... This will probably impact performance on arm for
> > properly aligned struct ip...
> >
>
> ETHER_ALIGN is wonderful... if you're on a platform that can DMA to an
> arbitrary boundary. Of course, if you're on such a platform it can
> probably just access the word-sized values on halfword boundaries
> anyway. For arm, the only solution at the driver level is to memcpy()
> every incoming packet to another buffer to realign it. If you think
> that makes receive performance really bad, you'd be right.
Having working on the TS-7200 which had this restriction, I know very
well the issues...
> Many arm platforms can only DMA on a cacheline boundary. The size of an
Are you sure? Last time I looked at the various drivers, about a year
or two ago, I think I found only one ARM driver that didn't do at
least 2 byte DMA alignment..
Either that, or they haven't been declaring their restrictions properly
in bus_dma_tag_create...
> mbuf header is like 24 or 28 or something, definitely not cache aligned.
> So in addition to the extra copying the drivers do for ETHER_ALIGN,
> there could also be bounce-buffer copying involved due to the alignment
> in the busdma tag.
Can you list a few of these drivers for me? I'd be curious to look
at them again...
> The latter issue could be fixed with an MD padding field at the end of
> the mbuf header to make the data portion start on a cache line boundary.
> When I experimented with that concept on imx6 I gained 10 MB/sec
> performance from the reduced copying.
ffec allows byte aligned rx dma, but it looks like 16 byte TX align...
If the ffec allows segments that are not a multiple of 16, there is/was
talk about supporting this, and then you'd only have to bounce a few
bytes instead of the entire packet, but I'm not sure if we can do that
w/ the existing bus_dma framework..
I can't find the define/var right now, but I remeber there being
an offset that is processed to try to put the packet at the correct
byte offset...
--
John-Mark Gurney Voice: +1 415 225 5579
"All that I will do, has been done, All that I have, has not."
More information about the svn-src-all
mailing list