Socket leak
Mikolaj Golub
to.my.trociny at gmail.com
Wed May 14 10:42:26 UTC 2008
On Tue, 13 May 2008 19:37:29 -0400 Mark Saad wrote:
MS> I started logging the values of kern.ipc.numopensockets and I noticed
MS> that something is leaking sockets. Here is a sample of the log
MS> 2008-04-29--15:04.10 ____ kern.ipc.numopensockets: 1501
MS> 2008-04-29--16:04.01 ____ kern.ipc.numopensockets: 1535
MS> 2008-04-29--17:04.00 ____ kern.ipc.numopensockets: 1617
MS> 2008-04-29--18:04.00 ____ kern.ipc.numopensockets: 1710
MS> This continues until kern.ipc.maxsockets its reached or the box is
MS> rebooted.
MS> The other thing we looked at was the output from vmstat -z
MS> The first thing was the high amount of malloc 128 bucket failures
MS> 128 Bucket: 524, 0, 2489, 80, 8364, 23055239
MS> I also logged the mbuf clusters, we never reached the max mbuf clusters
MS> Its almost like there are stale sockets. Here is a snapshot of the server now
MS> ewr# sockstat -4u |wc -l
MS> 139
MS> ewr# sysctl kern.ipc.numopensockets
MS> kern.ipc.numopensockets: 13935
MS> ewr# uptime
MS> 7:30PM up 6 days, 26 mins, 3 users, load averages: 0.18, 0.25, 0.17
We had the same problem on one of hosts running 6.2-RELEASE-p11. The situation
was complicated by the fact that I didn't have root access to the host and
there were problems with getting more debugging or running tcpdump.
Eventually, it appeared that problem was caused by proftpd. One of our clients
connected to ftp server every five minutes looking for new file to
download. When there was the file everything was good. But when there wasn't,
some strange things happened. In proftpd logs we had:
FTP session opened.
mod_delay/0.5: delaying for 28 usecs
user fake authenticated by mod_auth_pam.c
USER fake: Login successful.
Preparing to chroot to directory '/var/ftp/fake'
Environment successfully chroot()ed.
mod_delay/0.5: delaying for 621 usecs
Entering Passive Mode (XX,YY,ZZ,213,241,70).
FTP session closed.
i.e. the client connected to server, had login successful, created new DATA
connection in passive mode and then exited. But although proftpd reported that
connection closed and proftpd process exited we still had this orphaned
connection in our system reported by netstat in ESTABLISHED state. sockstat
did not display this connections. Some of these connections could be in
CLOSE_WAIT mode instead of ESTABLISHED. Such connection was seen by netstat
for several hours and then disappeared but I suspect that the socket buffer
was not freed and numopensockets counter did not decrease.
Unfortunately, I did not managed to persuade admin to increase DebugLevel in
proftpd.conf and run tcpdump to investigate more what was going on. It turned
out that we had proftpd built for FREEBSD5_4:
Compile-time Settings:
Version: 1.3.0
Platform: FREEBSD5 (FREEBSD5_4)
Built With:
configure --localstatedir=/var/run --sysconfdir=/usr/local/etc --disable-sendfile --disable-ipv6 --with-modules=mod_ratio:mod_readme:mod_rewrite:mod_wrap:mod_ifsession --prefix=/usr/local i386-portbld-freebsd5.4
Upgrade to more recent proftpd built for proper platform resolved the problem.
So I would recommend to look for process that could cause this leak. In my
case careful investigation of netstat output history and comparing with
sockstat output helped to find guilty. May be it would help to restart daemons
one by one and see if sockets are freed.
You can surely increase kern.ipc.maxsockets as workaround until you identify
what causes the problem.
--
Mikolaj Golub
More information about the freebsd-hackers
mailing list