TCP Reassembly Issues

Jeremy Chadwick freebsd at jdc.parodius.com
Sat Nov 26 08:01:55 UTC 2011


On Fri, Nov 25, 2011 at 11:56:47PM -0800, Jeremy Chadwick wrote:
> On Sat, Nov 26, 2011 at 12:49:24AM -0600, Kris Bauer wrote:
> > On Fri, Nov 25, 2011 at 11:23 PM, Lawrence Stewart <lstewart at freebsd.org>wrote:
> > 
> > > On 11/25/11 13:01, Lawrence Stewart wrote:
> > >
> > >> On 11/24/11 18:02, Kris Bauer wrote:
> > >>
> > >>> Hello,
> > >>>
> > >>> I am currently experiencing an issue with FreeBSD 9.0-RC2 r227852
> > >>> where the
> > >>> net.inet.tcp.reass.curesegments value is constantly increasing (and not
> > >>> descreasing when there is nominal traffic with the box). It is causing
> > >>> tcp
> > >>> slowdowns as described with kern/155407:
> > >>>
> > >>> Exhausted net.inet.tcp.reass.maxsegments block recovering tcp session
> > >>> (for
> > >>> this socket and any other socket waiting for retransmited packets). After
> > >>> exhausted net.inet.tcp.reass.maxsegments allocation new entry in
> > >>> tcp_reass
> > >>> failed (for this socket and any other socket waiting for retransmited
> > >>> packets).
> > >>>
> > >>> I have increased the reass.maxsegments value to 16384 to temporarily
> > >>> avoid
> > >>> the problem, but the cursegments number keeps rising and it seems it will
> > >>> occur again.
> > >>>
> > >>> Is this an issue that anyone else has seen? I can provide more
> > >>> information
> > >>> if need be.
> > >>>
> > >>
> > >> Thanks Kris, Raul and Stefan for the reports, I'll look into this.
> > >>
> > >
> > > I think I've got it - a stupid 1 line logic bug. My apologies for missing
> > > it when I reviewed the patch which introduced the bug (patch was committed
> > > to head as r226113, MFCed to stable/9 as r226228).
> > >
> > > Due to some miscommunication, the initial patch was committed to and MFCed
> > > from head much later than it should have been in the 9.0 release cycle and
> > > instead of being included in the BETAs, didn't make it in until 9.0-RC1 I
> > > believe i.e. only RC1 and RC2 should be experiencing the issue.
> > >
> > > Could those who have reported the bug and are able to recompile their
> > > kernel to test a patch please try the following and report back to the list:
> > >
> > >
> > > http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass_plugzoneleak_10.x.r227986.patch
> > >
> > > The patch is against head r227986 but will apply and work correctly for
> > > 9.0 as well.
> > >
> > > Cheers,
> > > Lawrence
> > >
> > 
> > I have patched, recompiled, and rebooted.  net.inet.tcp.reass.cursegments
> > is no longer incrementing, and connectivity is holding steady.  If anything
> > changes over the next couple of hours, I'll be sure to report it; but all
> > preliminary signs of the problem are gone.
> > 
> > Thanks for all the help!
> 
> Let's not be hasty in concluding everything is fixed.  Why I'm a bit on
> edge about this: I took the time to find the CVS commits that induced
> this issue in the first place, and it seems there is some history.
> 
> The commit that caused this problem to begin with was supposedly a fix
> for a different problem:
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_reass.c#rev1.375
> 
> A week later, that commit went from HEAD/MAIN into RELENG_9:
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_reass.c#rev1.374.2.2
> 
> Be sure to read the description of the problem that was being fixed in
> the first place.  I've also CC'd the original problem reporter, Steven
> Hartland, because we're going to need him to try the above patch from
> Lawrence to make sure there aren't other problems.  Meaning: for all we
> know, the above fix might work great for Kris but cause problems for
> Steve.
> 
> This entire situation leads me to believe very few people are doing
> quality testing of RELENG_9, yet we're already into 9.0-RC2.  Please
> don't tell me "that's exactly why you should be running RELENG_9!"; that
> is completely backwards and I refuse to get into a flame war about it,
> because it's this simple: 90%+ of those running FreeBSD on servers need
> something that's stable, we can't risk wonkiness (especially of this
> degree!) on systems taking production traffic.  Did no one actually test
> the change *thoroughly*?  Imagine had this lay dormant until 9.0-RELEASE.
> 
> Lawrence: please don't take my comments personally or to mean "you broke
> it and caused this mess!"  It's meant to read more along the lines of
> "you committed a fix for something that broke other bits badly, but
> nobody noticed this, including the original reporter of a different
> problem?  How/why?"  You get the idea.

Re-sending, because the "Tested by" commit line had someone who replaced
the "@" character with "-at-", so my mail client assumed the Email
address was on my local machine.  Sorry about that folks.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |



More information about the freebsd-stable mailing list