RX checksum offloading problem

Michael Tuexen Michael.Tuexen at lurchi.franken.de
Mon May 12 11:22:08 UTC 2014


On 12 May 2014, at 03:36, Yonghyeon PYUN <pyunyh at gmail.com> wrote:

> On Fri, May 09, 2014 at 12:46:48PM +0200, Michael Tuexen wrote:
>> On 09 May 2014, at 03:35, Yonghyeon PYUN <pyunyh at gmail.com> wrote:
>> 
>>> On Thu, May 08, 2014 at 08:40:22PM +0200, Michael Tuexen wrote:
>>>> On 07 May 2014, at 10:37, Yonghyeon PYUN <pyunyh at gmail.com> wrote:
>>>> 
>>>>> On Wed, May 07, 2014 at 10:07:09AM +0200, Michael Tuexen wrote:
>>>>>> On 07 May 2014, at 09:56, Yonghyeon PYUN <pyunyh at gmail.com> wrote:
>>>>>> 
>>>>>>> On Sat, May 03, 2014 at 11:52:47AM +0200, Michael Tuexen wrote:
>>>>>>>> On 02 May 2014, at 16:02, Bjoern A. Zeeb <bz at FreeBSD.org> wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 02 May 2014, at 10:22 , Michael Tuexen <Michael.Tuexen at lurchi.franken.de> wrote:
>>>>>>>>> 
>>>>>>>>>> Dear all,
>>>>>>>>>> 
>>>>>>>>>> during testing I found that FreeBSD head (on a raspberry pi) accepts SCTP packet
>>>>>>>>>> with bad checksums. After debugging this I figured out that this is a problem with
>>>>>>>>>> the csum_flags defined in mbuf.h.
>>>>>>>>>> 
>>>>>>>>>> The SCTP code on its input path checks for CSUM_SCTP_VALID, which is defined in mbuf.h:
>>>>>>>>>> #define CSUM_SCTP_VALID         CSUM_L4_VALID
>>>>>>>>>> This makes sense: If CSUM_SCTP_VALID is set in csum_flags, the packet is considered
>>>>>>>>>> to have a correct checksum.
>>>>>>>>>> 
>>>>>>>>>> For UDP and TCP some drivers calculate the UDP/TCP checksum and set CSUM_DATA_VALID in
>>>>>>>>>> csum_flags to indicate that the UDP/TCP should consider csum_data to figure out if
>>>>>>>>>> the packet has a correct checksum. The problem is that CSUM_DATA_VALID is defined as
>>>>>>>>>> #define CSUM_DATA_VALID         CSUM_L4_VALID
>>>>>>>>>> In this case the semantic is not that the packet has a valid checksum, but the csum_data
>>>>>>>>>> field contains information.
>>>>>>>>>> 
>>>>>>>>>> Now the following happens (on the raspberry pi the driver used is
>>>>>>>>>> dev/usb/net/if_smsc.c
>>>>>>>>>> 
>>>>>>>>>> 1. A packet is received and if it is not too short, the checksum computed
>>>>>>>>>> is stored in csum_data and the flag CSUM_DATA_VALID is set. This happens
>>>>>>>>>> for all IP packets, not only for UDP and TCP packets.
>>>>>>>>>> 2. In case of SCTP packets, the SCTP interprets CSUM_DATA_VALID as CSUM_SCTP_VALID
>>>>>>>>>> and accepts the packet. So no SCTP checksum check ever happened.
>>>>>>>>>> 
>>>>>>>>>> Alternatives to fix this:
>>>>>>>>>> 
>>>>>>>>>> 1. Change all drivers to set CSUM_DATA_VALID only in case of UDP or TCP packets, since
>>>>>>>>>> it only makes sense in these cases.
>>>>>>>>> 
>>>>>>>>> Wait, or for SCTP in cad the crc32 (I think it was)  was actually checked but not otherwise.   This is how it should be imho.  It seems like a driver bug.
>>>>>>>> I went through the list of drivers and you are right, it seems to be a bug
>>>>>>>> in if_smsc.c. Most of the other drivers check for UDP/TCP, a small set I can't tell.
>>>>>>>> 
>>>>>>> 
>>>>>>> I'm not sure how the controller computes TCP/UDP checksum values.
>>>>>>> It seems the publicly available data sheet was highly sanitized so
>>>>>>> it was useless to me.  The comment in the driver says that the
>>>>>> Same for me...
>>>>>>> controller computes RX checksum after the IPv4 header to the end of
>>>>>>> ethernet frame. After seeing that comment, three questions popped
>>>>>>> up:
>>>>>>> 
>>>> OK, I did some testing. It looks like the card is just computing the
>>>> checksum over the IP payload taking the correct IP header length into account.
>>>>>>> 1. Is the controller smart enough to skip IP options header in
>>>>>>> TCP/UDP checksum offloading?
>>>> Yes, I can send fragmented and un-fragmented UDP packets with IP options
>>>> and they are handled correctly. Even if the last fragment is too short.
>>> 
>>> I'm assuming you're taking about receiving fragmented UDP packets
>>> with RX checksum offloading, right?
>> Correct.
>>> 
>>>>>>> 2. How controller handles UDP checksum value 0x0000(i.e. sender
>>>>>>> didn't compute UDP checksum)?
>>>> This case isn't handled. However, udp_input() looks first for zero checksums
>>>> and only after that in the csum_flags. So it doesn't result in any problems.
>>>> Would you prefer not to set CSUM_DATA_VALID in this case?
>>> 
>>> At least, it correctly updates UDP stats of netstat(1).
>> Let me double check that...
>>> 
>>>>>>> 3. How the controller can compute TCP checksum of fragmented
>>>>>>> packets?
>>>> At least it does it right for UDP...
>>>> 
>>> 
>>> Hmm, CSUM_DATA_VALID indicates driver computed RX TCP/UDP checksum
>>> without pseudo header. As you know, controller can't compute
>>> TCP/UDP checksum until all its fragmented payload are read from
>>> wire.  Packets may arrive out of order and may be mixed with other
>> I'm not sure I understand this... Please note that the pseudo header
>> is not taken into account. So the card can compute the checksum over
>> the payload of IP for each fragment. This is stored in the csum_data
>> field. During reassembly the csum_data fields of the fragments are
>> combined in
>> http://svnweb.freebsd.org/base/head/sys/netinet/ip_input.c?view=markup#l1075
>> This looks OK to me. I'm not sure why you think the cards needs
>> to keep state for this. I understand it needs state if it wants
>> to take the pseudoheader into account, but this is not done here.
> 
> Oops, sorry. You're right.  Probably I was confused with old memory
> when I worked on that area.  I've quickly read IP reassembly code
> again and as you said, it should work.  However it seems there is a
> checksumming bug here.
> 
> 	/*
> 	 * In order to do checksumming faster we do 'end-around carry' here
> 	 * (and not in for{} loop), though it implies we are not going to
> 	 * reassemble more than 64k fragments.
> 	 */
> 	m->m_pkthdr.csum_data =
> 	    (m->m_pkthdr.csum_data & 0xffff) + (m->m_pkthdr.csum_data >> 16);
> 
> I guess the line above didn't account possible carry happened after
> the computation.  Probably it could be rewritten as the following.
> 
> 	while (m->m_pkthdr.csum_data & 0xffff0000)
> 		m->m_pkthdr.csum_data = (m->m_pkthdr.csum_data & 0xffff) +
> 		    (m->m_pkthdr.csum_data >> 16);
I think you are right here. Good catch. Will you fix it?

Best regards
Michael
> 
>>> packets. Advanced controllers with enough memory may be able to
>>> compute TCP/UDP checksums by tracking each connections(e.g LRO) but
>>> low-end controllers may be not. It seems the controller does not
>>> even support RX TCP/UDP pseudo header checksum offloading so I
>>> wonder how this controller can support RX TCP/UDP checksum
>>> offloading for fragmented TCP/UDP packets.
>> I'm not sure what I'm missing here... You compute the sum over
>> the parts and add the sums of the parts. That should work for
>> the UDP/TCP checksum.
>>> 
>>> Some controllers maintain two bits for TCP/UDP checksum offloading
>>> result in status word.  One bit is used to indicate whether
>>> controller performed TCP/UDP checksum offloading and the other bit
>>> is used to indicate whether the computed checksum is correct or
>> For SCTP we only get a bit that the checksum was computed and is correct...
>>> not. For UDP checksum value 0x0000 and fragmented TCP/UDP packets,
>>> these controllers do not attempt to compute TCP/UDP checksum.
>> I think it "just" computes it and leaves it up to the upper layer
>> to use the result or not...
>> 
> 
> Yes, I stand corrected. Thanks. :-)
> 
>> Best regards
>> Michael
>>> 
>>>> Best regards
>>>> Michael
>>>>>>> 
>>>>>>> Since you have the controller I guess it's easy to verify all
>>>>>>> cases.  For case 3, I believe the controller can't handle
>>>>>>> fragmented frames so driver should have to explicitly check ip_off
>>>>>>> field of IPv4 header.  See how gem(4)/sk(4)/hme(4) and fxp(4)
>>>>>>> handle it.
>>>>>> Let me check this. Is there a tool to send UDP/TCP with IP level options
>>>>>> or do I need to write a small test program myself?
>>>>>> 
>>>>> 
>>>>> I recall I used buggy ipsend of ipfilter package in the past but it
>>>>> would be more easy to write a simple test program or patch driver
>>>>> to generate those frames.
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
> 



More information about the freebsd-net mailing list