Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

Tue May 8 03:32:07 UTC 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- --On Monday, May 07, 2007 19:01:02 +0200 Oliver Fromme
<olli at lurza.secnetix.de> 
wrote:

> If I remember correctly, you wrote that 11k sockets are
> in use with 90 jails.  That's about 120 sockets per jail,
> which isn't out of the ordinary.  Of course it depends on
> what is running in those jails, but my guess is that you
> just need to increase the limit on the number of sockets
> (i.e. kern.ipc.maxsockets).

The problem is that if I compare it to another server, running 2/3 as many 
jails, I'm finding its using 1/4 as many sockets, after over 60 days of uptime:

kern.ipc.numopensockets: 3929
kern.ipc.maxsockets: 12328

But, let's try what I think it was Matt suggested ... right now, I'm at just 
over 11k sockets on that machine, so I'm going to shutdown everything except 
bare minimum server (all jails shut off) and see where sockets drop to after 
that ...

I'm down to ~7400 sockets:

kern.ipc.numopensockets: 7400
kern.ipc.maxsockets: 12328

ps looks like:

mars# ps aux
USER     PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
root      10 99.0  0.0     0     8  ??  RL   Sat12PM 2527:55.02 [idle: cpu1]
root      11 99.0  0.0     0     8  ??  RL   Sat12PM 2816:58.21 [idle: cpu0]
root       0  0.0  0.0     0     0  ??  WLs  Sat12PM   0:00.00 [swapper]
root       1  0.0  0.0   768   232  ??  ILs  Sat12PM   3:22.01 /sbin/init --
root       2  0.0  0.0     0     8  ??  DL   Sat12PM   0:31.14 [g_event]
root       3  0.0  0.0     0     8  ??  DL   Sat12PM  12:02.57 [g_up]
root       4  0.0  0.0     0     8  ??  DL   Sat12PM  17:20.73 [g_down]
root       5  0.0  0.0     0     8  ??  DL   Sat12PM   0:00.35 [thread taskq]
root       6  0.0  0.0     0     8  ??  DL   Sat12PM   0:00.00 [xpt_thrd]
root       7  0.0  0.0     0     8  ??  DL   Sat12PM   0:00.00 [kqueue taskq]
root       8  0.0  0.0     0     8  ??  DL   Sat12PM   0:00.00 [aic_recovery0]
root       9  0.0  0.0     0     8  ??  DL   Sat12PM   0:00.00 [aic_recovery0]
root      12  0.0  0.0     0     8  ??  WL   Sat12PM  12:11.84 [swi1: net]
root      13  0.0  0.0     0     8  ??  WL   Sat12PM  15:31.57 [swi4: clock]
root      14  0.0  0.0     0     8  ??  WL   Sat12PM   0:00.00 [swi3: vm]
root      15  0.0  0.0     0     8  ??  DL   Sat12PM   1:10.54 [yarrow]
root      16  0.0  0.0     0     8  ??  WL   Sat12PM   0:00.00 [swi6: task 
queue]
root      17  0.0  0.0     0     8  ??  WL   Sat12PM   0:00.00 [swi6: Giant 
taskq]
root      18  0.0  0.0     0     8  ??  WL   Sat12PM   0:00.00 [swi5: +]
root      19  0.0  0.0     0     8  ??  WL   Sat12PM  11:50.45 [swi2: cambio]
root      20  0.0  0.0     0     8  ??  WL   Sat12PM   8:28.94 [irq20: fxp0]
root      21  0.0  0.0     0     8  ??  WL   Sat12PM   0:00.00 [irq21: fxp1]
root      22  0.0  0.0     0     8  ??  WL   Sat12PM   0:00.00 [irq25: ahc0]
root      23  0.0  0.0     0     8  ??  DL   Sat12PM   0:00.00 [aic_recovery1]
root      24  0.0  0.0     0     8  ??  WL   Sat12PM   7:53.11 [irq26: ahc1]
root      25  0.0  0.0     0     8  ??  DL   Sat12PM   0:00.00 [aic_recovery1]
root      26  0.0  0.0     0     8  ??  WL   Sat12PM   0:00.00 [irq1: atkbd0]
root      27  0.0  0.0     0     8  ??  DL   Sat12PM   0:32.19 [pagedaemon]
root      28  0.0  0.0     0     8  ??  DL   Sat12PM   0:00.00 [vmdaemon]
root      29  0.0  0.0     0     8  ??  DL   Sat12PM  38:04.73 [pagezero]
root      30  0.0  0.0     0     8  ??  DL   Sat12PM   0:30.43 [bufdaemon]
root      31  0.0  0.0     0     8  ??  DL   Sat12PM  11:38.76 [syncer]
root      32  0.0  0.0     0     8  ??  DL   Sat12PM   0:57.76 [vnlru]
root      33  0.0  0.0     0     8  ??  DL   Sat12PM   1:21.24 [softdepflush]
root      34  0.0  0.0     0     8  ??  DL   Sat12PM   6:00.16 [schedcpu]
root      35  0.0  0.0     0     8  ??  DL   Sat12PM   6:26.10 [g_mirror md1]
root      36  0.0  0.0     0     8  ??  DL   Sat12PM   6:10.56 [g_mirror md2]
root      37  0.0  0.0     0     8  ??  DL   Sat12PM   0:00.00 [g_mirror vm]
root     480  0.0  0.0   528   244  ??  Is   Sat12PM   0:04.32 /sbin/devd
root     539  0.0  0.0  1388   848  ??  Ss   Sat12PM   0:07.21 
/usr/sbin/syslogd -l /var/run/log -l /var/named/var/run/log -s -s
daemon   708  0.0  0.0  1316   748  ??  Ss   Sat12PM   0:02.49 /usr/sbin/rwhod
root     749  0.0  0.0  3532  1824  ??  Is   Sat12PM   0:07.60 /usr/sbin/sshd
root     768  0.0  0.0  1412   920  ??  Is   Sat12PM   0:02.23 /usr/sbin/cron -s
root    2087  0.0  0.0  2132  1360  ??  Ss   Sat01PM   0:04.73 screen -R
root   88103  0.0  0.1  6276  2600  ??  Ss   11:41PM   0:00.62 sshd: root at ttyp0 
(sshd)
root   91218  0.0  0.1  6276  2664  ??  Ss   11:49PM   0:00.24 sshd: root at ttyp4 
(sshd)
root     813  0.0  0.0  1352   748  v0  Is+  Sat12PM   0:00.00 
/usr/libexec/getty Pc ttyv0
root   88106  0.0  0.1  5160  2516  p0  Ss   11:41PM   0:00.20 -tcsh (tcsh)
root   97563  0.0  0.0  1468   804  p0  R+   12:17AM   0:00.00 ps aux
root    2088  0.0  0.1  5352  2368  p2  Is+  Sat01PM   0:00.03 /bin/tcsh
root    2112  0.0  0.1  5220  2360  p3  Ss+  Sat01PM   0:00.04 /bin/tcsh
root   91221  0.0  0.1  5140  2440  p4  Ss+  11:49PM   0:00.12 -tcsh (tcsh)

And netstat -n -funix shows 7355 lines similar to:

d05f1000 stream      0      0        0 d05f1090        0        0
d05f1090 stream      0      0        0 d05f1000        0        0
cf1be000 stream      0      0        0 cf1bdea0        0        0
cf1bdea0 stream      0      0        0 cf1be000        0        0
cec42bd0 stream      0      0        0 cf2ac480        0        0
cf2ac480 stream      0      0        0 cec42bd0        0        0

with the final few associated with running processes:

c95ad000 stream      0      0 c95aa000        0        0        0 
/var/run/devd.pipe
c95aca20 dgram       0      0        0 c95ace10        0        0
c95accf0 dgram       0      0 c95c7110        0        0        0 
/var/named/var/run/log
c95acd80 dgram       0      0 c95c7330        0        0        0 /var/run/log
c95ace10 dgram       0      0 c95c7440        0 c95aca20        0 
/var/run/logpriv
c95acea0 dgram       0      0 c95c7550        0        0        0 /var/run/log

So, over 7000 sockets with pretty much all processes shut down ...

Shouldn't the garbage collector be cutting in somewhere here?

I'm willing to shut everthing down like this again the next time it happens (in 
2-3 days) if someone has some other command / output they'd like fo rme to 
provide the output of?

And, I have the following outputs as of the above, where everythign is shutdown 
and its running on minimal processes:

# ls -lt
total 532
- -rw-r--r--  1 root  wheel   11142 May  8 00:20 fstat.out
- -rw-r--r--  1 root  wheel     742 May  8 00:20 netstat_m.out
- -rw-r--r--  1 root  wheel  486047 May  8 00:20 netstat_na.out
- -rw-r--r--  1 root  wheel     735 May  8 00:20 sockstat.out
- -rw-r--r--  1 root  wheel    6266 May  8 00:20 vmstat_m.out
- -rw-r--r--  1 root  wheel    5376 May  8 00:20 vmstat_z.out
- -rw-r--r--  1 root  wheel    4910 May  8 00:20 ps.out

- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy at hub.org                              MSN . scrappy at hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGP+8z4QvfyHIvDvMRAlI+AJ9D0LIRCsFvQShS5TjN/QHw9VyTeQCggYMS
Uc0aJpCLwdZxsH3jVllUZi4=
=e97x
-----END PGP SIGNATURE-----