Socket leak

Mikolaj Golub to.my.trociny at gmail.com
Wed May 14 10:42:26 UTC 2008


On Tue, 13 May 2008 19:37:29 -0400 Mark Saad wrote:

 MS> I started logging the values of kern.ipc.numopensockets and I noticed
 MS> that something is leaking sockets. Here is a sample of the log

 MS> 2008-04-29--15:04.10 ____ kern.ipc.numopensockets: 1501
 MS> 2008-04-29--16:04.01 ____ kern.ipc.numopensockets: 1535
 MS> 2008-04-29--17:04.00 ____ kern.ipc.numopensockets: 1617
 MS> 2008-04-29--18:04.00 ____ kern.ipc.numopensockets: 1710

 MS> This continues until kern.ipc.maxsockets its reached or the box is
 MS> rebooted.

 MS> The other thing we looked at was the output from vmstat -z
 MS> The first thing was the high amount of malloc 128 bucket failures

 MS> 128 Bucket:    524,        0,     2489,       80,     8364, 23055239

 MS> I also logged the mbuf clusters, we never reached the max mbuf clusters

 MS> Its almost like there are stale sockets. Here is a snapshot of the server now

 MS> ewr# sockstat -4u |wc -l
 MS>     139
 MS> ewr# sysctl kern.ipc.numopensockets
 MS> kern.ipc.numopensockets: 13935

 MS> ewr# uptime
 MS> 7:30PM  up 6 days, 26 mins, 3 users, load averages: 0.18, 0.25, 0.17

We had the same problem on one of hosts running 6.2-RELEASE-p11. The situation
was complicated by the fact that I didn't have root access to the host and
there were problems with getting more debugging or running tcpdump.

Eventually, it appeared that problem was caused by proftpd. One of our clients
connected to ftp server every five minutes looking for new file to
download. When there was the file everything was good. But when there wasn't,
some strange things happened. In proftpd logs we had:

FTP session opened.
mod_delay/0.5: delaying for 28 usecs
user fake authenticated by mod_auth_pam.c
USER fake: Login successful.
Preparing to chroot to directory '/var/ftp/fake'
Environment successfully chroot()ed.
mod_delay/0.5: delaying for 621 usecs
Entering Passive Mode (XX,YY,ZZ,213,241,70).
FTP session closed.

i.e. the client connected to server, had login successful, created new DATA
connection in passive mode and then exited. But although proftpd reported that
connection closed and proftpd process exited we still had this orphaned
connection in our system reported by netstat in ESTABLISHED state. sockstat
did not display this connections. Some of these connections could be in
CLOSE_WAIT mode instead of ESTABLISHED. Such connection was seen by netstat
for several hours and then disappeared but I suspect that the socket buffer
was not freed and numopensockets counter did not decrease.

Unfortunately, I did not managed to persuade admin to increase DebugLevel in
proftpd.conf and run tcpdump to investigate more what was going on. It turned
out that we had proftpd built for FREEBSD5_4:

Compile-time Settings:
  Version: 1.3.0
  Platform: FREEBSD5 (FREEBSD5_4)
  Built With:
    configure --localstatedir=/var/run --sysconfdir=/usr/local/etc --disable-sendfile --disable-ipv6 --with-modules=mod_ratio:mod_readme:mod_rewrite:mod_wrap:mod_ifsession --prefix=/usr/local i386-portbld-freebsd5.4

Upgrade to more recent proftpd built for proper platform resolved the problem.

So I would recommend to look for process that could cause this leak. In my
case careful investigation of netstat output history and comparing with
sockstat output helped to find guilty. May be it would help to restart daemons
one by one and see if sockets are freed.

You can surely increase kern.ipc.maxsockets as workaround until you identify
what causes the problem.

-- 
Mikolaj Golub


More information about the freebsd-hackers mailing list