How can this 'top' command output make sense? Load over 7 and total CPU use ~5%

Sun May 24 07:46:50 UTC 2009

Yuri wrote:
> Look below: load over 7 and no processes take much CPU.
> 
> Yuri
> 
> 7.2-PRERELEASE, 32-bit on i7-920.
> 
> 
> 
> ------------------------------------------------------------
> last pid: 93192;  load averages:  7.68,  6.27,  
> 4.61                                                                                
> up 2+03:11:29  20:25:24
> 204 processes: 9 running, 193 sleeping, 1 stopped, 1 zombie
> CPU:  5.3% user,  0.0% nice,  0.0% system,  0.0% interrupt, 94.7% idle
> Mem: 867M Active, 1684M Inact, 279M Wired, 65M Cache, 112M Buf, 92M Free
> Swap: 16G Total, 142M Used, 16G Free
> 
>  PID USERNAME    THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> 60032 yuri          1  46    0   285M   183M select 0  41:15  0.59% Xorg
> 60400 yuri          1   4    0 12576K  9144K kqread 4  29:44  0.00% 
> wineserver
> 92982 yuri          1  44    0 53012K 16800K CPU3   3  18:50  0.00% 
> kdeinit4
> 92986 yuri          1  44    0 53012K 16800K CPU7   7  18:48  0.00% 
> kdeinit4
> 92988 yuri          1 107    0 53012K 16840K CPU6   6  17:22  0.00% 
> kdeinit4
> 60104 yuri          1  44    0   132M 45860K select 0  16:58  0.00% kwin
> 92984 yuri          1 117    0 53012K 16800K RUN    5  14:56  0.00% 
> kdeinit4
> 60096 yuri          1  44    0 89732K 30040K select 4  10:10  0.00% kded4
> 93141 yuri          1  53    0 53012K 16800K CPU5   5   3:52  0.00% 
> kdeinit4
> 93139 yuri          1  44    0 53012K 16800K CPU1   1   3:30  0.00% 
> kdeinit4
> 60174 yuri          1  44    0  3168K  1400K select 0   1:28  0.00% 
> ksysguardd
>  450 root          1   4    0  3128K   800K select 4   0:44  0.00% dhclient
> 1131 messagebus    1   4    0  3344K  1384K select 4   0:40  0.00% 
> dbus-daemon

Sure. This is not an uncommon occurrence really.  The load average is
the number of processes in the queue for a CPU time slice averaged over
5, 10 or 15 minutes.  For multi-core systems the LA is scaled by the number
of cores so a LA of 1.0 means all cores have active processes pretty much
continually.

Now, you might think that an active process will take the CPU utilisation
to 100%, but that is not necessarily so.  Some numerical applications can
do that, but purely CPU bound processes are relatively uncommon in everyday
usage.  In actuality what happens is that the processor will need to retrieve
data from somewhere to operate on.  There's a hierarchy of data stores of
various speeds (latency, rather than bandwidth):

   L1 Cache > L2 Cache > L3 Cache > Main RAM > Disk > Network

Where the L1 Cache is accessible in a few clock ticks (nanoseconds), Main 
RAM can take microseconds to access, disk can take milliseconds to access,
and Network can take 10 -- 1000s of milliseconds.

Or in other words, about 9 orders of magnitude difference.  So when the data
you need to process is too big to fit in the fastest caches, or when it comes
from a particularly slow location or when you have a lot of active processes
causing context switches, then the CPU core will be making frequent IO requests
and spending time waiting for them to be fulfilled.  

Now, for sources like disks and network where the retrieval is much slower than
the typical timescale of events on the CPU the process will yield the CPU to
something else and only get a new timeslice once the IO request has been
fulfilled.  For an access to main RAM however that form of yielding is less
likely.  Consequently the CPU can end up waiting for 100s of clock cycles until
it gets some bytes to process.  In the mean time, other processes are also sitting
in the queue wanting CPU time slices -- hence the high LA with low CPU utilization.

Scheduling CPU timeslices to make maximum use of available resources is the
difference between a really performant OS and a disaster.  A good scheduler
is the critical central piece of code around which the rest of an OS can be constructed.  Combine that with the complexity of having multiple cores, and
that threads of execution sometimes have to be moved to different cores, and
on other occasions sometimes need to stick to the same core in order to make
best use of resources and you will start to appreciate quite how hard it is to
write a good scheduler.  Unsurprisingly, the design of such things is a matter
of fairly impassioned debate amongst the rarified circle of people capable of
writing them.  That sort of argument was the genesis of the FreeBSD / DragonflyBSD
fork a few years back.  You can rest assured though that FreeBSD certainly does
have one of the very best schedulers currently available and it is specifically
targeted at getting the best out of the sort of multicore CPUs available nowadays.

	Cheers,

	Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.                   7 Priory Courtyard
                                                  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey     Ramsgate
                                                  Kent, CT11 9PW

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20090524/f80f8324/signature.pgp