bizarre em + TSO + MSS issue in RELENG_7
Pyun YongHyeon
pyunyh at gmail.com
Sat Nov 17 21:46:11 PST 2007
On Sat, Nov 17, 2007 at 11:18:34PM -0500, Mike Andrews wrote:
> Kip Macy wrote:
> >On Nov 17, 2007 5:28 PM, Mike Andrews <mandrews at bit0.com> wrote:
> >>Kip Macy wrote:
> >>>On Nov 17, 2007 3:23 PM, Mike Andrews <mandrews at bit0.com> wrote:
> >>>>On Sat, 17 Nov 2007, Kip Macy wrote:
> >>>>
> >>>>>On Nov 17, 2007 2:33 PM, Mike Andrews <mandrews at bit0.com> wrote:
> >>>>>>On Sat, 17 Nov 2007, Kip Macy wrote:
> >>>>>>
> >>>>>>>On Nov 17, 2007 10:33 AM, Denis Shaposhnikov <dsh at vlink.ru> wrote:
> >>>>>>>>On Sat, 17 Nov 2007 00:42:54 -0500 (EST)
> >>>>>>>>Mike Andrews <mandrews at bit0.com> wrote:
> >>>>>>>>
> >>>>>>>>>Has anyone run into problems with MSS not being respected when
> >>>>>>>>>using
> >>>>>>>>>TSO, specifically on em cards?
> >>>>>>>>Yes, I wrote about this problem on the beginning of 2007, see
> >>>>>>>>
> >>>>>>>> http://tinyurl.com/3e5ak5
> >>>>>>>>
> >>>>>>>if_em.c:3502
> >>>>>>> /*
> >>>>>>> * Payload size per packet w/o any headers.
> >>>>>>> * Length of all headers up to payload.
> >>>>>>> */
> >>>>>>> TXD->tcp_seg_setup.fields.mss =
> >>>>>>> htole16(mp->m_pkthdr.tso_segsz);
> >>>>>>> TXD->tcp_seg_setup.fields.hdr_len = hdr_len;
> >>>>>>>
> >>>>>>>
> >>>>>>>Please print out the value of tso_segsz here. It appears to be being
> >>>>>>>set correctly. The only thing I can think of is that t_maxopd is not
> >>>>>>>correct. As tso_segsz is correct here:
> >>>>>>It repeatedly prints 1368 during a 1 meg file transfer over a
> >>>>>>connection
> >>>>>>with a 1380 MSS. Any other printf's I can add? I'm working on a web
> >>>>>>page
> >>>>>>with tcpdump / firewall log output illustrating the issue...
> >>>>>Mike -
> >>>>>Denis' tcpdump output doesn't show oversized segments, something else
> >>>>>appears to be happening there. Can you post your tcpdump output
> >>>>>somewhere?
> >>>>URL sent off-list.
> >>> if (tso) {
> >>> m->m_pkthdr.csum_flags = CSUM_TSO;
> >>> m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen;
> >>> }
> >>>
> >>>
> >>>Please print the value of maxopd and optlen under "if (tso)" in
> >>>tcp_output. I think the calculated optlen may be too small.
> >>
> >>maxopt=1380 - optlen=12 = tso_segsz=1368
> >>
> >>Weird though, after this reboot, I had to re-copy a 4 meg file 5 times
> >>to start getting the firewall to log any drops. Transfer rate was
> >>around 240KB/sec before the firewall started to drop, then it went down
> >>to about 64KB/sec during the 5th copy, and stayed there for subsequent
> >>copies. The actual packet size the firewall said it was dropping was
> >>varying all over the place still, yet the maxopt/optlen/tso_segsz values
> >>stayed constant. But it's interesting that it didn't start dropping
> >>immediately after the reboot -- though the transfer rate was still
> >>sub-optimal.
> >
> >Ok, next theory :D. You shouldn't be seeing "bad len" packets from
> >tcpdump. I'm wondering if that means you're sending down more than
> >64k. Can you please print out the value of mp->m_pkthdr.len around the
> >same place that you printed out tso_segsz? 64k is the generally
> >accepted limit for TSO, I'm wondering if the card firmware does
> >something weird if you give it more.
>
> OK. In that last message, where I said it took 5 times to start
> reproducing the problem... this time it took until I actually toggled
> TSO back off and back on again, and then it started acting up again. I
> don't know what the actual trigger is... it's very weird.
>
> Initially, w/ TSO on and it wasn't dropping yet (but was still
> transferring slow)...
>
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=8306
> (etc, always 8306)
>
> After toggling off/on which caused the drops to start (and the speed to
> drop even further):
>
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=7507
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=3053
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1677
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=3037
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2264
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1656
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1902
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1888
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1640
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1871
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2461
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=1849
> BIT0 DEBUG: tso_segsz=1368 hdr_len=66 mp->m_pkthdr.len=2092
>
> and so on, with more seemingly random lengths... but none of them ever
> over 8306, much less 64K.
It seems that em_tso_setup() doesn't clear txd_upper/txd_lower in
failure path so that unintialized value could be used in subsequent
Tx descriptor setup.
How about clearing those variable?(Patch attached)
It seems that em(4) uses EM_TSO_SIZE(64K) to create DMA tag. A packet
can have 64K payload under TSO so its the mximum size of the mbuf
chain would be 64K + sizeof(link layer). So I guess the EM_TSO_SIZE
should be increased to hold sizeof(link layer).
It had been a long time since I looked into em(4) so I'm not sure.
--
Regards,
Pyun YongHyeon
-------------- next part --------------
Index: if_em.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v
retrieving revision 1.184
diff -u -r1.184 if_em.c
--- if_em.c 10 Sep 2007 21:50:40 -0000 1.184
+++ if_em.c 18 Nov 2007 05:42:35 -0000
@@ -1791,6 +1791,7 @@
m_head = *m_headp;
/* Do hardware assists */
+ txd_upper = txd_lower = 0;
if (em_tso_setup(adapter, m_head, &txd_upper, &txd_lower))
/* we need to make a final sentinel transmit desc */
tso_desc = TRUE;
More information about the freebsd-current
mailing list