ULE scheduling oddity
Barney Cordoba
barney_cordoba at yahoo.com
Thu Jul 17 16:12:46 UTC 2008
--- On Wed, 7/16/08, Steve Kargl <sgk at troutmask.apl.washington.edu> wrote:
> From: Steve Kargl <sgk at troutmask.apl.washington.edu>
> Subject: Re: ULE scheduling oddity
> To: "Barney Cordoba" <barney_cordoba at yahoo.com>
> Cc: current at freebsd.org
> Date: Wednesday, July 16, 2008, 5:13 PM
> On Wed, Jul 16, 2008 at 07:49:03AM -0700, Barney Cordoba
> wrote:
> > --- On Tue, 7/15/08, Steve Kargl
> <sgk at troutmask.apl.washington.edu> wrote:
> > > last pid: 3874; load averages: 9.99, 9.76,
> 9.43 up 0+19:54:44 10:51:18
> > > 41 processes: 11 running, 30 sleeping
> > > CPU: 100% user, 0.0% nice, 0.0% system, 0.0%
> interrupt, 0.0% idle
> > > Mem: 5706M Active, 8816K Inact, 169M Wired, 84K
> Cache, 108M
> > > Buf, 25G Free
> > > Swap: 4096M Total, 4096M Free
> > >
> > > PID USERNAME THR PRI NICE SIZE RES
> STATE C TIME WCPU COMMAND
> > > 3836 kargl 1 118 0 577M 572M CPU7
> 7 6:37 100.00% kzk90
> > > 3839 kargl 1 118 0 577M 572M CPU2
> 2 6:36 100.00% kzk90
> > > 3849 kargl 1 118 0 577M 572M CPU3
> 3 6:33 100.00% kzk90
> > > 3852 kargl 1 118 0 577M 572M CPU0
> 0 6:25 100.00% kzk90
> > > 3864 kargl 1 118 0 577M 572M RUN
> 1 6:24 100.00% kzk90
> > > 3858 kargl 1 112 0 577M 572M RUN
> 5 4:10 78.47% kzk90
> > > 3855 kargl 1 110 0 577M 572M CPU5
> 5 4:29 67.97% kzk90
> > > 3842 kargl 1 110 0 577M 572M CPU4
> 4 4:24 66.70% kzk90
> > > 3846 kargl 1 107 0 577M 572M RUN
> 6 3:22 53.96% kzk90
> > > 3861 kargl 1 107 0 577M 572M CPU6
> 6 3:15 53.37% kzk90
> > >
> > > I would have expected to see a more evenly
> distributed WCPU
> > > of around 80% for each process.
> >
> > I don't see why "equal" distribution is
> or should be a goal, as that
> > does not guarantee optimization.
>
> The above images may be parts of an MPI application.
> Synchronization
> problems simply kill performance. The PIDs with 100% WCPU
> could be
> spinning in a loop waiting for PID 3861 to send a message
> after
> completing a computation. The factor of 2 difference in
> TIME for
> PID 3836 and 3861 was still observed after more than an
> hour of
> accumulated time for 3836. It appears as if the algorithm
> for
> cpu affinity is punishing 3846 and 3861.
>
> > Given that the cache is shared between only 2 cpus, it
> might very well
> > be more efficient to run on 2 CPUs when the 3rd or 4th
> isn't needed.
> >
> > It works pretty darn well, IMO. Its not like your
> little app is the
> > only thing going on in the system
>
> Actually, 10 copies of the little app are the only things
> running except
> top(1) and few sleeping system services (e.g., nfsd and
> sshd). Apparently,
> you missed the "41 processes: 11 running, 30
> sleeping" line above.
>
> --
> Steve
Your apparent argument that somehow every cpu cycle can be sliced equally and automagically is as silly as the expectation that a first generation scheduler will exhibit 100% efficiency across 8 cpus. Its just as likely an inefficiency in the application as in the kernel.
More information about the freebsd-current
mailing list