[Bug 192889] accept4 socket hangs in CLOSED (memcached)

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Sun Feb 15 06:43:28 UTC 2015


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=192889

mp39590 at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mp39590 at gmail.com

--- Comment #15 from mp39590 at gmail.com ---
Reason for this bug to happen lies not in the network stack, but in
capabilities subsystem.

Memcached consists of a dispatcher thread and several worker threads, which
communicates through a pipe, for example if new connection is accepted,
dispatcher writes 'c' to a pipe for a selected worker thread (it switches them
in round-robin manner), worker thread then popup the connection from the queue
and serves it.

Due to a slight race condition in capabilities, kevent() mechanism sometimes
may return spurious ENOTCAPABLE errors for the descriptors. It makes libevent
to abort the loop which works with the connections and return. Memcached
doesn't expect it to happen and worker thread silenty returns[1] and dies. You
may see it with procstat command, comparing count of threads in normal and
failing situation - you will be one thread short for the last.

Dispatcher is not aware of this catastrophic event, and therefor continues to
write "c"'s about new connection to the pipe of that, already dead, thread, but
of course no one will serve those connections and they're left on the air.

And reasons why you see it as massive amount of CLOSED\CLOSE_WAIT connections
is simply the fact that client by timeout or by any other ways decided to
close() its connection. Network stack receives FIN packet and expects our
application to issue close() on the descriptor, but since thread is already
dead - it will never happen.

This bug was addressed by Mateusz in r273137[2].

[1] - https://github.com/memcached/memcached/blob/master/thread.c#L369
[2] - https://svnweb.freebsd.org/base?view=revision&revision=273137

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list