ULE scheduling oddity

Thu Jul 17 16:12:46 UTC 2008

--- On Wed, 7/16/08, Steve Kargl <sgk at troutmask.apl.washington.edu> wrote:

> From: Steve Kargl <sgk at troutmask.apl.washington.edu>
> Subject: Re: ULE scheduling oddity
> To: "Barney Cordoba" <barney_cordoba at yahoo.com>
> Cc: current at freebsd.org
> Date: Wednesday, July 16, 2008, 5:13 PM
> On Wed, Jul 16, 2008 at 07:49:03AM -0700, Barney Cordoba
> wrote:
> > --- On Tue, 7/15/08, Steve Kargl
> <sgk at troutmask.apl.washington.edu> wrote:
> > > last pid:  3874;  load averages:  9.99, 9.76,
> 9.43  up 0+19:54:44  10:51:18
> > > 41 processes:  11 running, 30 sleeping
> > > CPU:  100% user,  0.0% nice,  0.0% system,  0.0%
> interrupt, 0.0% idle
> > > Mem: 5706M Active, 8816K Inact, 169M Wired, 84K
> Cache, 108M
> > > Buf, 25G Free
> > > Swap: 4096M Total, 4096M Free
> > > 
> > >   PID USERNAME    THR PRI NICE   SIZE    RES
> STATE  C  TIME   WCPU COMMAND
> > >  3836 kargl         1 118    0   577M   572M CPU7
>   7  6:37 100.00% kzk90
> > >  3839 kargl         1 118    0   577M   572M CPU2
>   2  6:36 100.00% kzk90
> > >  3849 kargl         1 118    0   577M   572M CPU3
>   3  6:33 100.00% kzk90
> > >  3852 kargl         1 118    0   577M   572M CPU0
>   0  6:25 100.00% kzk90
> > >  3864 kargl         1 118    0   577M   572M RUN 
>   1  6:24 100.00% kzk90
> > >  3858 kargl         1 112    0   577M   572M RUN 
>   5  4:10 78.47% kzk90
> > >  3855 kargl         1 110    0   577M   572M CPU5
>   5  4:29 67.97% kzk90
> > >  3842 kargl         1 110    0   577M   572M CPU4
>   4  4:24 66.70% kzk90
> > >  3846 kargl         1 107    0   577M   572M RUN 
>   6  3:22 53.96% kzk90
> > >  3861 kargl         1 107    0   577M   572M CPU6
>   6  3:15 53.37% kzk90
> > > 
> > > I would have expected to see a more evenly
> distributed WCPU
> > > of around 80% for each process.
> > 
> > I don't see why "equal" distribution is
> or should be a goal, as that
> > does not guarantee optimization.
> 
> The above images may be parts of an MPI application. 
> Synchronization
> problems simply kill performance.  The PIDs with 100% WCPU
> could be
> spinning in a loop waiting for PID 3861 to send a message
> after
> completing a computation.  The factor of 2 difference in
> TIME for
> PID 3836 and 3861 was still observed after more than an
> hour of
> accumulated time for 3836.  It appears as if the algorithm
> for
> cpu affinity is punishing 3846 and 3861.
> 
> > Given that the cache is shared between only 2 cpus, it
> might very well
> > be more efficient to run on 2 CPUs when the 3rd or 4th
> isn't needed.
> > 
> > It works pretty darn well, IMO. Its not like your
> little app is the
> > only thing going on in the system 
> 
> Actually, 10 copies of the little app are the only things
> running except
> top(1) and few sleeping system services (e.g., nfsd and
> sshd).  Apparently,
> you missed the "41 processes:  11 running, 30
> sleeping" line above.
> 
> -- 
> Steve

Your apparent argument that somehow every cpu cycle can be sliced equally and automagically is as silly as the expectation that a first generation scheduler will exhibit 100% efficiency across 8 cpus. Its just as likely an inefficiency in the application as in the kernel.