%cpu in system - squid performance in FreeBSD 5.3

Sat Dec 25 04:40:18 PST 2004

On Thu, 23 Dec 2004, Jeff Behl wrote:

> As a follow up to the below (original message at the very bottom), I
> installed a load balancer in front of the machines which terminates the
> tcp connections from clients and opens up a few, persistent connections
> to each server over which requests are pipelined.  In this scenario
> everything is copasetic: 

I'm not very familiar with Squid's architecture, but I would anticipate
that what you're seeing is that the cost of additional connections served
in parallel is pretty high due to the use of processes.  Specifically: if
each TCP connection being served gets its own process, and there are a lot
of TCP connections, you'll be doing a lot of process forking, context
switching, exceeding cache sizes, etc.  With just a couple of connections,
even if they're doing the same "work", the overhead is much lower. 
Depending on how much time you're willing to invest in this, we can
probably do quite a bit to diagnose where the cost is coming from and look
for any specific problems or areas we could optimize.

I might start by turning on kernel profiling and doing a profile dump
under load.  Be aware that turning on profiling uses up a lot of CPU
itself, so will reduce the capacity of the system.  There's probably
documentation elsewhere, but the process I use to set up profiling is
here:

  http://www.watson.org/~robert/freebsd/netperf/profile/

Note that it warns the some results may be incorrect on SMP.  I think it
would be useful to give it a try anyway just to see if we get something
useful.

The next thing that would be interesting is using mutex profiling to
measure contention on mutexes.  The instructions in MUTEX_PROFILING(9) are
pretty decent for this purpose.  On an SMP system, time spent contending a
mutex in active use will be spent spinning, which means wasted CPU.  You
can cause the kernel to block threads instead using options
NO_ADAPTIVE_MUTEXES, but measurement in the past has shown that the
overhead of blocking and restarting a thread is generally higher than just
spinning.

It would be useful to see the output of dmesg at boot to see if any
performance options are obviously out of place.  Likewise, the output of a
couple of stats commands while the system is active would be useful -- for
example, a couple of snapshots of "systat -vmstat 1", "netstat -mb",
"vmstat -i", "top -S", and "iostat".

As a final question: other than CPU consumption, do you have a reliable
way to measure how efficiently the system is operating -- in particular,
how fast it is able to serve data?  Having some sort of metric for
performance can be quite useful in optimizing, as it can tell us whether
we're accomplishing incremental improvements prior to performance
improving to a point where the system isn't saturated.  Typical forms
might be some sort of web benchmark, etc.  If so, it might be interesting
to compare the performance of the following configurations:

- UP kernel (no SMP compiled in)
- SMP kernel but SMP disabled using the appropriate tunable
- SMP kernel with SMP enabled

Finally, I'm not sure if the box has HTT on it, and if so, if HTT is
enabled, but you might want to try disabling it, as it has proven to be
relatively ineffective in improving performance in the application tests
I've run, while at the same time increasing operating overhead.

Another variable that might be interesting to look at is net.isr.enable. 
To do this, you want to be running 5-STABLE rather than 5.3-RELEASE, as I
merged at least one significant bug fix that affects its operation.  By
default, net.isr.enable is 0, meaning that all inbound network traffic is
processed in the netisr thread.  When this variable is set to 1, inbound
network traffic will be, where possible, directly dispatched in the device
driver ithread.  This has a couple of impacts, but the main ones are that
there are substantially fewer context switches being done, and that
parallelism is possible between the netisr and each interface card.  This
is an experimental feature, so be on the lookout for any resulting nits.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org      Principal Research Scientist, McAfee Research

> 
> last pid:  3377;  load averages:  0.12,  0.09,  0.08
> up 0+17:24:53  10:02:13
> 31 processes:  1 running, 30 sleeping
> CPU states:  5.1% user,  0.0% nice,  1.8% system,  1.2% interrupt, 92.0%
> idle
> Mem: 75M Active, 187M Inact, 168M Wired, 40K Cache, 214M Buf, 1482M Free
> Swap: 4069M Total, 4069M Free
> 
>   PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU
> COMMAND
>   474 squid     96    0 68276K 62480K select 0  53:38 16.80% 16.80%
> squid
>   311 bind      20    0 10628K  6016K kserel 0  12:28  0.00%  0.00%
> named
> 
> 
> 
> It's actually so good that one machine can now handle all traffic
> (around 180 Mb/s) at < %50 cpu utilization.  Seems like something in the
> network stack is responsible for the high %system cpu util...
> 
> jeff
> 
> 
> -----Original Message-----
> From: owner-freebsd-performance at freebsd.org
> [mailto:owner-freebsd-performance at freebsd.org] On Behalf Of Jeff Behl
> Sent: Tuesday, December 07, 2004 9:17 AM
> To: Sean Chittenden
> Cc: freebsd-performance at freebsd.org
> Subject: Re: %cpu in system - squid performance in FreeBSD 5.3
> 
> I upgraded to STABLE but  most cpu time is still being spent in system.
> 
> This system is doing ~20Mb/s total with all content being grabbed out of
> memory.  I see similar results when running MySQL (a lot of time being
> spent in system)
> 
> Any ideas on what updates to be on the lookout for that might help with
> this?  Am I right in guessing that this is a SMP issue and doesn't have
> anything to do with AMD architecture?
> 
> thx
> 
> 
> 
> FreeBSD www2 5.3-STABLE FreeBSD 5.3-STABLE #2: Sun Dec  5 21:06:14 PST 
> 2004     root at www2.cdn.sjc:/usr/obj/usr/src/sys/SMP  amd64
> 
> 
> last pid: 15702;  load averages:  0.15,  0.31,  0.31                up 
> 0+19:55:14  09:09:28
> 38 processes:  2 running, 36 sleeping
> CPU states:  5.4% user,  0.0% nice, 12.7% system,  3.4% interrupt, 78.4%
> idle
> Mem: 163M Active, 284M Inact, 193M Wired, 72K Cache, 214M Buf, 1245M
> Free
> Swap: 4069M Total, 4069M Free
> 
>   PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU
> COMMAND
>   486 squid     96    0 79820K 73996K CPU1   1 110:00 15.04% 15.04%
> squid
>   480 squid     96    0 75804K 70012K select 0 105:56 14.89% 14.89%
> squid
> 
> 
> 
> 
> 
> Sean Chittenden wrote:
> 
> >> but the % system time can fluctuate up to 60 at times.  My question 
> >> is if this is about the type of performance I could expect, or if 
> >> people have seen better.
> >
> >
> > I don't know about other people, but I suspect you're running into 
> > lock contention.  Try using a post 5.3 snapshot (something from
> > RELENG_5) since alc@ has set debug.mpsafevm=1, which lets many calls 
> > to the VM run without GIANT, which I suspect is your problem and why 
> > the system usage is all over the place.  -sc
> >
> 
> _______________________________________________
> freebsd-performance at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to
> "freebsd-performance-unsubscribe at freebsd.org"
> 
> 
> 
> howdy,
> 
> I've got a dual proc AMD64 (2gHz) FreeBSD 5.3R system running two squid
> processes (to take advantage of both CPUs). Each process is doing
> around 195 req/s, and the total bandwidth is ~40Mb/s (gig nic via bge
> driver). Squid is being used exclusively as a reverse proxy, with all
> content being served out of memory (very little disk activity).
> 
> Top shows:
> 
> CPU states: 16.0% user, 0.0% nice, 42.7% system, 7.6% interrupt, 33.6%
> idle
> Mem: 898M Active, 569M Inact, 179M Wired, 214M Buf, 171M Free
> Swap: 4069M Total, 4069M Free
> 
> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND
> 14598 squid 108 0 463M 459M select 0 39.2H 59.96% 59.96% squid
> 14605 squid 105 0 421M 416M CPU0 1 38.4H 49.95% 49.95% squid
> 
> but the % system time can fluctuate up to 60 at times. My question is
> if this is about the type of performance I could expect, or if people
> have seen better. I was expecting to see much better performance,
> seeing how everything is being served out of memory, but maybe I'm
> asking too much? 400 reqs/s from RAM doesn't seem like much. Is this a
> FreeBSD issue (anybody else with similar experience)? A majority of the
> cpu time being spent in system would seem to indictate such. What is
> all the system load? How can i tell?
> 
> Any help/pointers/remarks appreciated
> 
> thanks,
> jeff
> _______________________________________________
> freebsd-performance at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to "freebsd-performance-unsubscribe at freebsd.org"
>