Shared Memory?

Thu Jun 1 13:46:11 PDT 2006

Hello All,

I'm not a programmer and nor do I play one in real life.. :)

I've recently setup a DansGuardian box for someone and I had some 
interesting things happen.

When the box would get under load (500+ simultaneout connections) it 
would load up the cpu:

last pid: 69931;  load averages:  4.73,  3.56,  3.32  up 5+11:10:58
09:56:31
49 processes:  8 running, 41 sleeping

Mem: 157M Active, 202M Inact, 106M Wired, 20M Cache, 60M Buf, 8168K Free
Swap: 2048M Total, 32K Used, 2048M Free

    PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
49814 guardian    1 120    0 85868K 85160K RUN      0:01 14.87% dansguardian
30132 guardian    1 120    0 85868K 85180K RUN      0:22 14.11% dansguardian
52245 guardian    1 119    0 85860K 85168K RUN      0:06 13.94% dansguardian
23445 guardian    1 120    0 85896K 85208K RUN      0:22 13.87% dansguardian

at this time there were 10 dansguardian processes running.  the default 
config suggests 120 to start off with.. (doing that crashed the box in 
about 5 minutes)

I found one thing that seemed to help:
kern.ipc.shm_use_phys=1

from man tuning.

after setting the sysctl value the system now looks like this:
last pid: 40265;  load averages:  0.29,  0.29,  0.27 
                                                                   up 
7+17:55:46  16:41:47
34 processes:  1 running, 33 sleeping
CPU states:  0.0% user,  0.0% nice,  0.7% system,  1.5% interrupt, 97.8% 
idle
Mem: 125M Active, 249M Inact, 98M Wired, 16M Cache, 60M Buf, 4392K Free
Swap: 2048M Total, 36K Used, 2048M Free

   PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
  6266 guardian    1  96    0 76116K 18004K select   0:05 12.54% 
dansguardian
   696 guardian    1  96    0 76112K 16960K select   0:01  0.81% 
dansguardian
  8969 guardian    1  96    0 76112K  6036K select   0:00  0.12% 
dansguardian
21017 squid       1  96    0 31228K 26684K select  41:52  0.00% squid

After searching I can't seem to find out when it's appropriate (or not) 
to set this and if anything else should be set in conjunction with it.

Other than the fact that this helped.. can anyone point me in a 
direction or tell me why it helped?

collecting pv entries -- suggest increasing PMAP_SHPGPERPROC
collecting pv entries -- suggest increasing PMAP_SHPGPERPROC

this error is what somewhat lead me to this discovery.  And in hoping to 
fix that it suggested recompling the kernel with those values changed.. 
NOTES tells me that that value is now 201, google has people with 
numbers all over the place.. and I still can't seem to figure out why 
they did it.

egrep -v "#" /etc/sysctl.conf

security.bsd.see_other_uids=0
net.inet.ip.forwarding=1
net.inet.ip.random_id=1
kern.randompid=10000
kern.coredump=0

kern.ipc.shmmax=536870912
kern.ipc.shm_use_phys=1

This is a stock 6.1 GENERIC kernel

The box is a router for internet traffic that passes several gigs of 
data from about 2500+ users.

Its a small 866 w/ 512M of ram and as previously stated running 
DansGuardian (www/dansguardian) and squid (www/squid).

I've asked a few times for information on the DG list, but I guess it's 
mainly a linux only crowd as I did not hear anything back from anyone.

netstat -m
260/2155/2415 mbufs in use (current/cache/total)
258/1264/1522/17088 mbuf clusters in use (current/cache/total/max)
258/1210 mbuf+clusters out of packet secondary zone in use (current/cache)
0/0/0/0 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/0 9k jumbo clusters in use (current/cache/total/max)
0/0/0/0 16k jumbo clusters in use (current/cache/total/max)
581K/3066K/3647K bytes allocated to network (current/cache/total)
56061/494261/470674 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/9/4528 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
12 requests for I/O initiated by sendfile
328 calls to protocol drain routines

They want me to move it a larger box just for the sake of putting it on 
a larger box.. (2.2G Xeon w/ 2G ram) but I'd like to tune it better.. as 
opposed to just throw hardware at it and hope for the best.

all data/packets passes over lo..

lo0   16384 127           127.0.0.1         57055828     - 33798613 
-     -

and the box so far has been up for 7 days.

Any information helping me understand this beast would be greatly 
appreciated.

- Brian