Sockets stuck in SYN_RCVD (re(4), RELENG_7, i386)
olli at lurza.secnetix.de
Tue Nov 20 02:16:37 PST 2007
I'm watching a very strange problem here. There are two
machines with almost identical hardware (see dmesg and
pciconf output at the bottom). They also run identical
sources: RELENG_7 (i386) as of October-18. I know it's
a few weeks old, but I haven't seen any changes in the
CVS that might be related to the following problem.
On the first machine, I see a slow, but constant increase
of the number of sockets in state SYN_RCVD in "netstat -an"
output. The number of those sockets is the same as sysctl
net.inet.tcp.syncache.count. This does not happen on the
second machine at all (count is zero).
At the moment, the count on the first machine is 702. We
first noticed it three days ago when the count was 330,
which leads to the assumption that the problem started
about six days ago. However, the machine has an uptime of
32 days. So something must have triggered the problem
after about 26 days of uptime.
The port numbers and remote IPs of the SYN_RCVD sockets
seem to be completely random. Most of the local ports
are port 25, but a few are also port 80 or port 53.
These are the ports most often used on the machine, all
other ports are blocked in IPFW. In very rare cases a
socket leaves the SYN_RCVD state. For example, yesterday
I watched a socket with local destination port 80 that
was in state SYN_RCVD for about 40 minutes and then
Both machines are only very lightly loaded. In fact they
are pretty much 100% idle most of the time. They run
sendmail, apache, BIND and a few minor things, but they
really don't do much.
There's nothing in the logs. Both machines have an re(4)
interface. However, one interesting difference is that
the first machine runs in GigE mode, while the second,
while the second runs only at 100 Mbps. I don't know if
the speed changed; the machines are colocated and if have
no idea what kind of switch ports they are connected to.
It could well be that the first machine's port was changed
from 100M to GigE six days ago. I'm reluctant to change
the speed manually to 100M, because I might lose the link
if the switch is fixed at GigE. I would have to initiate
a remote reboot in that case.
Another thing worth noting is the fact that the second
machine only has an uptime of 21 days. I'm curious if
it will start collecting SYN_RCVD sockets when it reaches
26 days, too. :-)
By the way, the problem does not seem to affect normal
operation, so I'm not too worried at the moment. I can
connect to the machine's services (ssh, http, smtp, dns)
without any problems.
A few data:
$ sysctl net.inet.tcp.syncache
$ netstat -s | sed -n '/sync/,/rec/p'
395637 syncache entries added
0 bucket overflow
0 cache overflow
0 zone failures
395637 cookies sent
175 cookies received
Output from dmesg and pciconf of the first machine is here:
For comparison, this is the second machine which does _not_
exhibit the problem:
Please let me know if I should provide more information.
The next thing I would try is to reboot the machine, so
I can see whether the problem occurs immediately or only
after some uptime.
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart
FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd
"Python tricks" is a tough one, cuz the language is so clean. E.g.,
C makes an art of confusing pointers with arrays and strings, which
leads to lotsa neat pointer tricks; APL mistakes everything for an
array, leading to neat one-liners; and Perl confuses everything
period, making each line a joyous adventure <wink>.
-- Tim Peters
More information about the freebsd-current