Listen queue overflow: N already in queue awaiting acceptance

Luigi Rizzo rizzo at iet.unipi.it
Thu Jul 11 15:06:13 UTC 2013


On Thu, Jul 11, 2013 at 4:52 PM, Gleb Smirnoff <glebius at freebsd.org> wrote:
> On Thu, Jul 11, 2013 at 04:49:25PM +0200, Luigi Rizzo wrote:
> L> >> IMO, this should be a single counter accessible via sysctl, with no
> L> >> printf(). Those, who need details on whether this is micro-burst or
> L> >> persistent condition, can run monitoring software that draws plots.
> L> >
> L> >
> L> > The single counter wouldn't tell you anything because it misses which
> L> > socket/accept queue is affected by the overflow.  The inpcb pointer
> L> > can be cross-refrenced with netstat -a.
> L> >
> L> > Andriy for example would never have found out about this problem other
> L> > than receiving vague user complaints about aborted connection attempts.
> L> > Maybe after spending many hours searching for the cause he may have
> L> > interfered from endless scrolling in Wireshark that something wasn't
> L> > right and blame syncache first.  Only later it would emerge that he's
> L> > either receiving too many connections or his application is too slow
> L> > dealing with incoming connections.
> L> >
> L> > If you can recommend a suitable and general sysadmin friendly monitoring
> L> > software that will point out this problem I'm all ears.
> L>
> L> the problem with these non-throttled messages is that they often
> L> cause thrashing -- you become slighly slow, messages start being
> L> generated and your system becomes a lot slower, making it hard
> L> to recover.
> L>
> L> What i usually do is throttle (in the kernel) and count the number of
> L> message suppressed. Something like this (in a macro):
> L>
> L> static int ctr, last_tick;
> L> if (ticks - last_tick > suppression_delay) {
> L>     printf("got this error ... (%d times)\n", ... , ctr);
> L>     ctr = 0;
> L>     last_tick = tick;
> L> } else {
> L>     ctr++;
> L> }
> L>
> L> the errors may not be exactly the same, the counter is race_prone
> L> (you can make it atomic if you really feel like) but the whole point is
> L> to get the idea that something is very wrong, not the exact count
> L> or pointer
>
> btw, there is ready function for that: ppsratecheck(), already utilized
> for suppressing some error messages.

yes, i think i saw it before. To me, the convenience of the macro is that
it can also wrap the declaration of the static variables and the printf.
I basically have macros like this (see sys/dev/netmap/netmap_kern.h)

     RD(max_pps, "printf format ", arguments....) // rate-limited printf

    ND(same arguments as above) // compiles to no-op

so i can quickly add the messages or disable them by simply changing
the macro name
FWIW the macro in netmap_kern.h does not have the counter of suppressed
messages (I just thought about it , but i should probably add it as a feature)

cheers
luigi

> --
> Totus tuus, Glebius.



-- 
-----------------------------------------+-------------------------------
 Prof. Luigi RIZZO, rizzo at iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
 TEL      +39-050-2211611               . via Diotisalvi 2
 Mobile   +39-338-6809875               . 56122 PISA (Italy)
-----------------------------------------+-------------------------------


More information about the freebsd-net mailing list