svn commit: r271946 - in head/sys: dev/oce dev/vmware/vmxnet3 dev/xen/netfront kern net netinet ofed/drivers/net/mlx4 sys

Tue Oct 14 09:11:19 UTC 2014

Hi Lawrence,

First of all, thanks for feedback on this patch.

On 10/13/14 16:19, Lawrence Stewart wrote:
> Hi Hans,
>
> I have questions and feedback regarding this patch that I was hoping to
> work through with you. Some general points are below and then context
> specific points are inline with the patch further down.
>
> - Is QinQ support affected by this change?

In what way affected?

>
> - There are some style(9) nits throughout that might be tweaked if parts
> of this patch end up being reworked e.g. new lines in
> if_hw_tsomax_common() and if_hw_tsomax_update()

Ok.

>>
>>   #if __FreeBSD_version >= 1000000
>> -	sc->ifp->if_hw_tsomax = OCE_MAX_TSO_SIZE;
>> +	sc->ifp->if_hw_tsomax = 65536 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);
>> +	sc->ifp->if_hw_tsomaxsegcount = OCE_MAX_TX_ELEMENTS;
>> +	sc->ifp->if_hw_tsomaxsegsize = 4096;
>>   #endif
>>
>>   	ether_ifattach(sc->ifp, sc->macaddr.mac_addr);
>>
>
> I don't like the use of the 65536 magic number here and throughout the
> driver changes. Also, should it be 65536 or IP_MAXPACKET (65535)?

It should be 65536. As far as I know, this is a hardwired limit inside 
the NFS layer, which it uses when creating mbuf chains, to avoid 
fragmentation of data. "Rick Macklem" CC'ed can explain more.

>> +		/*
>> +		 * If the "if_hw_tsomax" limit is set, check if it is
>> +		 * too small:
>> +		 */
>> +		KASSERT(ifp->if_hw_tsomax == 0 ||
>> +		    ifp->if_hw_tsomax >= (IP_MAXPACKET / 8),
>> +		    ("%s: if_hw_tsomax is outside of range", __func__));
>
> I don't understand the second condition of the KASSERT i.e.
> "ifp->if_hw_tsomax >= (IP_MAXPACKET / 8)"

I think this second KASSERT can now safely be removed. From what I 
guess, it is there to prevent that if you set the MTU to be more than 
the TSO limit, the old code would run into a KASSERT in tcp_output.c . 
The new code switches TSO off and uses regular sending if it cannot fit 
the IP packet(s) within the TSO restrictions.

>>
>>
>> Modified: head/sys/netinet/tcp_output.c
>> ==============================================================================
>> --- head/sys/netinet/tcp_output.c	Mon Sep 22 07:59:25 2014	(r271945)
>> +++ head/sys/netinet/tcp_output.c	Mon Sep 22 08:27:27 2014	(r271946)
>> @@ -767,28 +767,113 @@ send:
>>   		flags &= ~TH_FIN;
>>
>>   		if (tso) {
>> +			u_int if_hw_tsomax;
>> +			u_int if_hw_tsomaxsegcount;
>> +			u_int if_hw_tsomaxsegsize;
>> +			struct mbuf *mb;
>> +			u_int moff;
>> +			int max_len;
>> +
>> +			/* extract TSO information */
>> +			if_hw_tsomax = tp->t_tsomax;
>> +			if_hw_tsomaxsegcount = tp->t_tsomaxsegcount;
>> +			if_hw_tsomaxsegsize = tp->t_tsomaxsegsize;
>> +
>> +			/*
>> +			 * Limit a TSO burst to prevent it from
>> +			 * overflowing or exceeding the maximum length
>> +			 * allowed by the network interface:
>> +			 */
>
> This appears to have been moved from just below for no apparent reason?

Correct.

>
>>   			KASSERT(ipoptlen == 0,
>>   			    ("%s: TSO can't do IP options", __func__));
>>
>>   			/*
>> -			 * Limit a burst to t_tsomax minus IP,
>> -			 * TCP and options length to keep ip->ip_len
>> -			 * from overflowing or exceeding the maximum
>> -			 * length allowed by the network interface.
>> +			 * Check if we should limit by maximum payload
>> +			 * length:
>>   			 */
>> -			if (len > tp->t_tsomax - hdrlen) {
>> -				len = tp->t_tsomax - hdrlen;
>> -				sendalot = 1;
>> +			if (if_hw_tsomax != 0) {
>> +				/* compute maximum TSO length */
>> +				max_len = (if_hw_tsomax - hdrlen);
>> +				if (max_len <= 0) {
>> +					len = 0;
>
> Is the "max_len < 0" check useful? If "if_hw_tsomax - hdrlen" is leq 0,
> then TSO is unusable on the interface is it not?

 From what I can see there are no checks or asserts anywhere else, 
except the KASSERT() in the network adapter attach, which compares this 
limit to IP_MAXPACKET/8 . You are right that the TSO is unusable if too 
low, and the problem is that drivers change the TSO enable/disable/limit 
flags/values after initial xxx_ifattach(), so we have no way to really 
stop invalid parameters, except here.

>
>> +				} else if (len > (u_int)max_len) {
>> +					sendalot = 1;
>> +					len = (u_int)max_len;
>> +				}
>
> Why is max_len cast to u_int for comparison/assignment with/to len here
> and elsewhere?

Because "max_len" is an "int" and we compare to below zero. "max_len" 
can be changed into "unsigned", if the checks are updated.

>
>> +			}
>> +
>> +			/*
>> +			 * Check if we should limit by maximum segment
>> +			 * size and count:
>> +			 */
>> +			if (if_hw_tsomaxsegcount != 0 && if_hw_tsomaxsegsize != 0) {
>
>
> Is it conceivable a driver may want to limit by maxsegcount or
> maxsegsize, but not by both?

Technically speaking, yes. The input for me was that some pieces of 
hardware can do more than 65535 bytes of TSO packets, while others not. 
The "maxsegcount * maxsegsize" limit might then indicate a too large 
maximum TSO segment limit, because the TSO payload length typically is 
written into a 16-bit field, while the maximum segment list can indicate 
more data can be sent then actually will be.

>
> In the event if_hw_tsomax, if_hw_tsomaxsegcount and if_hw_tsomaxsegsize
> are non-zero, the calculation of len related to if_hw_tsomax will be
> overridden to a potentially incorrect value in this block.

Can you tell which line?

There is already a check at the beginning of the function, that the 
payload length must be greater than a certain limit. And we are not 
overriding beyond that limit.

>
>> +				max_len = 0;
>> +				mb = sbsndmbuf(&so->so_snd, off, &moff);
>> +
>> +				while (mb != NULL && (u_int)max_len < len) {
>> +					u_int cur_length;
>> +					u_int cur_frags;
>> +
>> +					/*
>> +					 * Get length of mbuf fragment
>> +					 * and how many hardware
>> +					 * frags, rounded up, it would
>> +					 * use:
>> +					 */
>> +					cur_length = (mb->m_len - moff);
>> +					cur_frags = (cur_length + if_hw_tsomaxsegsize -
>> +					    1) / if_hw_tsomaxsegsize;
>> +
>> +					/* Handle special case: Zero Length Mbuf */
>> +					if (cur_frags == 0)
>> +						cur_frags = 1;
>> +
>> +					/*
>> +					 * Check if the fragment limit
>> +					 * will be reached or
>> +					 * exceeded:
>> +					 */
>> +					if (cur_frags >= if_hw_tsomaxsegcount) {
>> +						max_len += min(cur_length,
>> +						    if_hw_tsomaxsegcount *
>> +						    if_hw_tsomaxsegsize);
>> +						break;
>> +					}
>> +					max_len += cur_length;
>> +					if_hw_tsomaxsegcount -= cur_frags;
>> +					moff = 0;
>> +					mb = mb->m_next;
>> +				}
>> +				if (max_len <= 0) {
>> +					len = 0;
>> +				} else if (len > (u_int)max_len) {
>> +					sendalot = 1;
>> +					len = (u_int)max_len;
>> +				}
>>   			}
>>
>>   			/*
>>   			 * Prevent the last segment from being
>> -			 * fractional unless the send sockbuf can
>> -			 * be emptied.
>> +			 * fractional unless the send sockbuf can be
>> +			 * emptied:
>> +			 */
>> +			max_len = (tp->t_maxopd - optlen);
>> +			if ((off + len) < so->so_snd.sb_cc) {
>> +				moff = len % (u_int)max_len;
>> +				if (moff != 0) {
>> +					len -= moff;
>> +					sendalot = 1;
>> +				}
>> +			}
>> +
>> +			/*
>> +			 * In case there are too many small fragments
>> +			 * don't use TSO:
>>   			 */
>> -			if (sendalot && off + len < so->so_snd.sb_cc) {
>> -				len -= len % (tp->t_maxopd - optlen);
>> +			if (len <= (u_int)max_len) {
>> +				len = (u_int)max_len;
>>   				sendalot = 1;
>> +				tso = 0;
>>   			}
>
> I don't quite understand the "too many small fragements" check.

For example: If the maximum number of segments is "1" for example, then 
only one mbuf will will fit. If that mbuf is an ethernet TCPI/IP-header, 
then we cannot use TSO for this case. Also, if the maximum segment count 
is greater than "1" and such that we end up sending less than MTU, we 
should not use TSO. Instead the ethernet drivers forward such special 
packets to m_defrag(), before outputting them.

>>
>> Modified: head/sys/netinet/tcp_var.h
>> ==============================================================================
>> --- head/sys/netinet/tcp_var.h	Mon Sep 22 07:59:25 2014	(r271945)
>> +++ head/sys/netinet/tcp_var.h	Mon Sep 22 08:27:27 2014	(r271946)
>> @@ -199,9 +199,12 @@ struct tcpcb {
>>   	u_int	t_keepintvl;		/* interval between keepalives */
>>   	u_int	t_keepcnt;		/* number of keepalives before close */
>>
>> -	u_int	t_tsomax;		/* tso burst length limit */
>> +	u_int	t_tsomax;		/* TSO total burst length limit in bytes */
>> +
>> +	uint32_t t_ispare[6];		/* 5 UTO, 1 TBD */
>> +	uint32_t t_tsomaxsegcount;	/* TSO maximum segment count */
>> +	uint32_t t_tsomaxsegsize;	/* TSO maximum segment size in bytes */
>>
>> -	uint32_t t_ispare[8];		/* 5 UTO, 3 TBD */
>>   	void	*t_pspare2[4];		/* 1 TCP_SIGNATURE, 3 TBD */
>>   	uint64_t _pad[6];		/* 6 TBD (1-2 CC/RTT?) */
>>   };
>
> I don't think the patch for head should be consuming the spares in
> struct tcpcb. We generally only consume spares when MFCing to stable
> branches, leaving the spares intact on the head branch.

It was done this way to facilitate an MFC to 10- . Can you suggest how 
the structure should look after restoring the spares?

>
>
> That's the initial list of things I noticed.

Thanks!

--HPS