[RFT][patch] Scheduling for HTT and not only

Sat Mar 3 15:26:15 UTC 2012

В Sat, 03 Mar 2012 14:54:17 +0200
Alexander Motin <mav at FreeBSD.org> пишет:

> On 03/03/12 11:12, Alexander Motin wrote:
> > On 03/03/12 10:59, Adrian Chadd wrote:
> >> Right. Is this written up in a PR somewhere explaining the problem
> >> in as much depth has you just have?
> >
> > Have no idea. I am new at this area and haven't looked on PRs yet.
> >
> >> And thanks for this, it's great to see some further explanation of
> >> the current issues the scheduler faces.
> >
> > By the way I've just reproduced the problem with compilation. On
> > dual-core system net/mpd5 compilation in one stream takes 17
> > seconds. But with two low-priority non-interactive CPU-burning
> > threads running it takes 127 seconds. I'll try to analyze it more
> > now. I have feeling that there could be more factors causing
> > priority violation than I've described below.
> 
> On closer look my test appeared not so clean, but instead much more 
> interesting. Because of NFS use, there is not just context switches 
> between make, cc and as, that are possibly optimized a bit now, but
> many short sleeps when background process gets running. As result, in
> some moments I see such wonderful traces for cc:
> 
> wait on runq for 81ms,
> run for 37us,
> wait NFS for 202us,
> wait on runq for 92ms,
> run for 30us,
> wait NFS for 245us,
> wait on runq for 53ms,
> run for 142us,
> 
> About 0.05% CPU time use for process that supposed to be CPU-bound.
> And while for small run/sleep times ratio process could be nominated
> on interactivity, with so small absolute sleep times it will need
> ages to compensate 5 seconds of "batch" run history, recorded before.
> 
> >> On 2 March 2012 23:40, Alexander Motin<mav at freebsd.org> wrote:
> >>> On 03/03/12 05:24, Adrian Chadd wrote:
> >>>>
> >>>> mav@, can you please take a look at George's traces and see if
> >>>> there's anything obviously silly going on?
> >>>> He's reporting that your ULE work hasn't improved his (very)
> >>>> degenerate case.
> >>>
> >>>
> >>> As I can see, my patch has nothing to do with the problem. My
> >>> patch improves
> >>> SMP load balancing, while in this case problem is different. In
> >>> some cases,
> >>> when not all CPUs are busy, my patch could mask the problem by
> >>> using more
> >>> CPUs, but not in this case when dnets consumes all available CPUs.
> >>>
> >>> I still not feel very comfortable with ULE math, but as I
> >>> understand, in both illustrated cases there is a conflict between
> >>> clearly CPU-bound dnets
> >>> threads, that consume all available CPU and never do voluntary
> >>> context switches, and more or less interactive other threads. If
> >>> other threads detected to be "interactive" in ULE terms, they
> >>> should preempt dnets threads
> >>> and everything will be fine. But "batch" (in ULE terms) threads
> >>> never preempt each other, switching context only about 10 times
> >>> per second, as hardcoded in sched_slice variable. Kernel build by
> >>> definition consumes too
> >>> much CPU time to be marked "interactive". exo-helper-1 thread in
> >>> interact.out could potentially be marked "interactive", but
> >>> possibly once it
> >>> consumed some CPU to become "batch", it is difficult for it to get
> >>> back, as
> >>> waiting in a runq is not counted as sleep and each time it is
> >>> getting running, it has some new work to do, so it remains
> >>> "batch". May be if CPU
> >>> time accounting was more precise it would work better (by
> >>> accounting those
> >>> short periods when threads really sleeps voluntary), but not with
> >>> present
> >>> sampled logic with 1ms granularity. As result, while dnets threads
> >>> each time
> >>> consume full 100ms time slices, other threads are starving,
> >>> getting running
> >>> only 10 times per second to voluntary switch out in just a few
> >>> milliseconds.
> >>>
> >>>
> >>>> On 2 March 2012 16:14, George Mitchell<george+freebsd at m5p.com>
> >>>> wrote:
> >>>>>
> >>>>> On 03/02/12 18:06, Adrian Chadd wrote:
> >>>>>>
> >>>>>>
> >>>>>> Hi George,
> >>>>>>
> >>>>>> Have you thought about providing schedgraph traces with your
> >>>>>> particular workload?
> >>>>>>
> >>>>>> I'm sure that'll help out the scheduler hackers quite a bit.
> >>>>>>
> >>>>>> THanks,
> >>>>>>
> >>>>>>
> >>>>>> Adrian
> >>>>>>
> >>>>>
> >>>>> I posted a couple back in December but I haven't created any
> >>>>> more recently:
> >>>>>
> >>>>> http://www.m5p.com/~george/ktr-ule-problem.out
> >>>>> http://www.m5p.com/~george/ktr-ule-interact.out
> >>>>>
> >>>>> To the best of my knowledge, no one ever examined them. --
> >>>>> George
> >>>
> >>> --
> >>> Alexander Motin
> 
> 

I have FreeBSD 10.0-CURRENT #0 r232253M
Patch in r232454 broken my DRM
My system patched http://people.freebsd.org/~kib/drm/all.13.5.patch
After build kernel with only r232454 patch Xorg log contains:
...
[   504.865] [drm] failed to load kernel module "i915"
[   504.865] (EE) intel(0): [drm] Failed to open DRM device for pci:0000:00:02.0: File exists
[   504.865] (EE) intel(0): Failed to become DRM master.
[   504.865] (**) intel(0): Depth 24, (--) framebuffer bpp 32
[   504.865] (==) intel(0): RGB weight 888
[   504.865] (==) intel(0): Default visual is TrueColor
[   504.865] (**) intel(0): Option "DRI" "True"
[   504.865] (**) intel(0): Option "TripleBuffer" "True"
[   504.865] (II) intel(0): Integrated Graphics Chipset: Intel(R) Sandybridge Mobile (GT2)
[   504.865] (--) intel(0): Chipset: "Sandybridge Mobile (GT2)"
and black screen...

do not even know why it happened ... :(