cvs commit: src/sys/netinet ip_icmp.c tcp.h tcp_input.c tcp_subr.ctcp_usrreq.c tcp_var.h

Andre Oppermann andre at freebsd.org
Fri Jan 9 12:30:35 PST 2004


Nate Lawson wrote:
> 
> On Fri, 9 Jan 2004, Andre Oppermann wrote:
> > silby at silby.com wrote:
> > >
> > > > David Xu wrote:
> > > >>
> > > >> I got following messages when I am running mysql stress test suite,
> > > >> and the test can not be completed.
> > > >>
> > > >> "too many small tcp packets from 128.0.0.1:20672, av. 91byte/packet,
> > > >> dropping connection"
> > > >
> > > > You can set net.inet.tcp.minmssoverload to a higher value than the
> > > > default of 1,000.  I suggest trying with 2,000 as next step and see if
> > > > it still overloads.
> > > >
> > > > Appearently my default of 1,000 pps is fine for normal use but too low
> > > > for some edge cases.
> > > >
> > > > Could you check the MySQL source code if it has a setsockopt() setting
> > > > the TCP_NODELAY option?  That would help to explain a lot.
> > >
> > > This might nerf the protection a bit, but could reduce the packet counter
> > > once for each socket write the local machine does?  That should protect
> > > chatty applications, but still detect those that are just flooding data to
> > > a bulk service like ftp or smtp.
> 
> This is exactly what I was worried about.  I know of several applications
> that send/receive lots of small packets as a control connection,
> especially over localhost.  Most are a sort of RPC mechanism where
> TCP_NODELAY is set to make sure the request gets to the server
> immediately and is not queued according to Nagle.

I'm not saying that TCP_NODELAY is bad or wrong.  Not at all.  However
sometimes there seems to a misunderstanding on what exactly it does.

> > It doesn't help in this case as we don't have any control over the sender
> > and thus don't know whether he has set TCP_NODELAY.
> 
> Perhaps you didn't understand Mike?  You don't care if TCP_NODELAY is set
> on their side, all you care about is the packet equilibrium.  If you send
> data in response to receiving a segment, the net equilibrium is preserved.
> The real behavior you want to detect is someone sending a lot of small
> chunks of data that the application could process as larger chunks.  If
> the application waits until it has a full "record" before responding, you
> can distinguish the degenerate case by the application's response rate.

Ok, that is a very useful distinction.  I could modify the detection
logic to take *sent* packets into account (except ACKs of course).

> > I suspect that the database(s) are setting TCP_NODELAY and do a write()
> > for every record they have retrieved from the query result.  Yet one more
> > who has been fooled by the name "TCP_NODELAY".  The database would be
> > better off and have more performance not using nodelay and let Nagle do
> > its work.
> 
> In the case above, the small packets are coming from an ephemeral port so
> they are likely the query packets, not the response.  But if you
> subtracted one from the counter each time the database responded with
> data, it's likely the request/response rate would be roughly constant.

Maybe, probabaly.

> The database did not set TCP_NODELAY, the client did.  Since the query is
> a small request and you need a response before you can send the next
> request (assuming it's not doing transaction logging), you do want
> TCP_NODELAY on the client.

At least Oracle seems to set it on the server side too.  That is how
I came to my conclusion.  Methinks something like fsync() for tcp
sockets might be useful.  If the database is doing a write for every
row it has got from the query result with TCP_NODELAY it is quite
inefficient.  On the other hand if there is only one row you don't
want to Nagle-delay the answer.  With a tcp fsync() the database
could simply do its write as before and the socket buffer will queue
it until a max MSS packet is filled.  When the last row is written it
will issue a fsync() and whatever is in the buffer will get sent out
immediatly.  This would make the database communication way more effective
because the database client usually is not processing the answers until
the full query result is received.  Even then, we don't get any delay
just fully used packets.

Taking this further also telnet and SSH could benefit from this.  When
^C a large listing or so takes ages to take effect because there are
so many small packets in flight.  So when there is bulk data it again
can have both worlds large packets and fast answers/responses.

What do you think?  (I know this is off-topic to the minmss settings)

-- 
Andre


More information about the cvs-all mailing list