9.2 ixgbe tx queue hang
Rick Macklem
rmacklem at uoguelph.ca
Mon Mar 24 22:47:48 UTC 2014
Christopher Forgeron wrote:
> I'm going to split this into different posts to focus on each topic.
> This
> is about setting IP_MAXPACKET to 65495
>
> Update on Last Night's Run:
>
> (Last night's run is a kernel with IP_MAXPACKET = 65495)
>
> - Uptime on this run: 10:53AM up 13:21, 5 users, load averages:
> 1.98,
> 2.09, 2.13
> - Ping logger records no ping errors for the entire run.
> - At Mar 24th 10:57 I did a grep through the night's log for 'before'
> (which is the printf logging that Rick suggested a few days ago), and
> saved
> it to before_total.txt
> - With wc -l on before_total.txt I can see that we have 504 lines,
> thus 504
> incidents of the packet being above IP_MAXPACKET during this run.
> - I did tr -c '[:alnum:]' '[\n*]' < before_total.txt | sort | uniq -c
> |
> sort -nr | head -50 to list the most common words. Ignoring the
> non-pklen
> output. The relevant output is:
>
> 344 65498 (3)
> 330 65506 (11)
> 330 65502 (7)
>
This makes sense to me, since tp->t_tsomax is used in tcp_output() for
the TCP/IP packet, which does not include the link level (ethernet)
header. When that is added, I would expect the length to be up to 14
(or maybe 18 for vlan cases) greater than IP_MAXPACKET. Since none of
these are greater than 65509, this looks fine to me.
So, unless you get ones greater than (65495 + 18 = 65513), this makes
sense and does not indicate a problem.
In another post, you indicate that having the driver set if_hw_tsomax
didn't set tp->t_tsomax to the same value.
--> I believe that is a bug and would mean my ixgbe.patch would not
fix the problem, because it is tp->t_tsomax that must be decreased
to at least (65536 - 18 = 65518).
--> Now, have you tried a case between 65495 and 65518 and seen
any EFBIG errors?
If so, then I don't understand why 65518 isn't small enough?
rick
> - First # being the # of times. (Each pklen is printed twice on the
> log,
> thus 2x the total line count).
> - Last (#) being the byte overrun from 65495
> - A fairly even distribution of each type of packet overrun.
>
> You will recall that my IP_MAXPACKET is 65495, so each of these
> packet
> lengths represents a overshoot.
>
> The fact that we have only 3 different types of overrun is good - It
> suggests a non-random event, more like a broken 'if' statement for a
> particular case.
>
I think it just means that your load happens to do only 3 sizes of I/O
that is a little less than 65536.
> If IP_MAXPACKET was set to 65535 as it normally is, I would have had
> 504
> incidents of errors, with a chance that any one of them could have
> blocked
> the queue for considerable time.
>
If tp->t_tsomax hasn't been set to a smaller value than 65535, the
ixgbe.patch didn't do what I thought it would.
> Question: Should there be logic that discards packets that are over
> IP_MAXPACKET to ensure that we don't end up in a blocked queue
> situation
> again?
>
>
> Moving forward, I am doing two things:
>
> 1) I'm running a longer test with TSO disabled on my ix0 adapter. I
> want
> to make sure that over say 4 hours I don't have even 1 packet over
> 65495.
> This will at least locate the issue to TSO related code.
>
> 2) I have tcpdump running, to see if I can capture the packets over
> 65495.
> Here is my command. Any suggestions on additional switches I should
> include?
>
> tcpdump -ennvvXS greater 65495
>
> I'll report in on this again once I have new info.
>
> Thanks for reading.
>
> On Mon, Mar 24, 2014 at 2:14 AM, Christopher Forgeron
> <csforgeron at gmail.com>wrote:
>
> > Hi,
> >
> > I'll follow up more tomorrow, as it's late and I don't have time
> > for
> > detail.
> >
> > The basic TSO patch didn't work, as packets were were still going
> > over
> > 65535 by a fair amount. I thought I wrote that earlier, but I am
> > dumping a
> > lot of info into a few threads, so I apologize if I'm not as
> > concise as I
> > could be.
> >
> > However, setting IP_MAXPACKET did. 4 hours of continuous run-time,
> > no
> > issues. No lost pings, no issues. Of course this isn't a fix - but
> > it helps
> > isolate the problem.
> > > what the story is a few months down the road.
> > >
> > >
> > > Thanks for the patches, will have to start giving them code-names
> > > so
> > > we can keep them straight. :-) I guess we have printf, tsomax,
> > > and
> > > this one.
> > >
> > >
> >
> >
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe at freebsd.org"
>
More information about the freebsd-net
mailing list