rusage breakdown and cpu limits.

Tue May 29 20:25:58 UTC 2007

On Wed, 30 May 2007, Bruce Evans wrote:

> On Tue, 29 May 2007, Jeff Roberson wrote:
>
>> I'm working with Attilio to break down rusage further to be per-thread in 
>> places where it is protected by the global scheduler lock.  To support 
>> this, I am interested in moving the rlimit cpulimit check into userret(), 
>> or perhaps ast().  Is there any reason why we need to check this on every 
>> context switch?  Any objections to moving it?  Eventually it will require a 
>> different lock from the one we obtain to call mi_switch().
>
> Checking it on every userret() would be a good pessimization, and
> scheduling ast()'s to check it occasionally wouldn't be far behind.
> Syscalls occur many times more often than context switches.

Yes, you're right.  I think I'll make a callout that will check once per 
second.

>
> Locking is needed for the updating the runtime in mi_switch, so it
> seems best to check for the runtime becoming too large there while
> holding the same lock.  This lock is currently sched_lock and it covers
> the following other fields:

I think you missed the point of my threadlock patch.  There will no longer 
be a global scheduler lock.  Because of this, multiple threads in a 
process may context switch at the same time.  The list below will be 
protected by a thread lock or a per-process spinlock depending on the use.

>
> Fields in in the proc struct (or attached to it, not counting the
> thread struct):
>
> 	p_stats->p_ru.ru_nvcsw
> 	p_stats->p_ru.ru_nivcsw
> 	p_stats->p_ru.rux_runtime (this)
> 	p_stats->p_ru.rux_uticks
> 	p_stats->p_ru.rux_iticks
> 	p_stats->p_ru.rux_sticks
> 	p_cpulimit                (less volatile than the others)
> 	p_sflag                   (needs sched_lock more than the others?)
> 	p_flag                    (KSE only, invalid? -- needs PROC lock)
>
> Related fields in the pcpu:
>
> 	switchtime
> 	switchticks
>
> Related fields in the thread struct:
>
> 	td_uticks
> 	td_sticks
> 	td_iticks
> 	td_flags
>
> Unrelated fields in the thread struct:
>
> 	td_critnest               (in a KASSERT() only)
> 	td_owepreempt             (in a KASSERT() only)
> 	td_generation
> 	td_sched                  (not yet proerly locked)
> 	td_priority               (this and following only in CTR*())
> 	td_inhibitors
> 	td_lockname
> 	td_wmesg
>
> I don't see why you emphasized the cpulimit check, since it is the
> least delicate part of the rusage maintainance done by mi_check().  I
> don't see how the locking requirements can be reduced significantly
> by keeping the rusage in the thread struct more of the time (than the
> tick counts already are) and only accumulating it into the proc struct
> when needed.  There must be a context switch of `switchtime' here, and
> this seems to require sched locking.  It doesn't help to accumulate
> the runtime into the thread struct instead of into the rusage -- sched
> locking still seems to be required to switch the runtime.  The core of
> this switch is:

If we move to a per-thread rusage the only time we have to aggregate and 
deal with it on a per-process basis is at thread or process exit, and when 
we check the process runtime to check resource limits.  That's why I'm 
interested in this case.  The others are more straightforward.  This one 
must move out of mi_switch() however, or be adjusted with atomics or 
similar.

Jeff

>
> 	new_switchtime = cpu_ticks();
> 	delta_runtime = new_switchtime - PCPU_GET(switchtime);
> 	PCPU_SET(switchtime, new_switchtime);
>
> and this must be done without letting any other thread run on the CPU.
> This seems to require sched_lock.  After determining delta_runtime,
> we can add it into either thread data for accumulation later or to
> more global data (currently rusage).  This part doesn't need such
> strong locking.  We currently add it to rusage since we already hold
> the lock for this, and then checking the limit here also has low cost.
>
> Note that in the above code, keeping `switchtime' in the pcpu is an
> optimization (for space and perhaps for clarity).  It could be kept
> in the thread that owns the CPU.  Then its locking requirements might
> be clearer -- it must be switched at the same time as td_oncpu.  The
> latter clearly requires sched_lock.  The code for the latter has moved
> from mi_switch() sched_switch().  Maybe you emphasized the cpulimit
> check because this point wasn't clear -- things that strictly required
> sched_lock were supposed to have all been moved to sched_switch()?
>
> p_sflag and td_flag are only needed if we detect that the cpulimit has
> been exceeded (to schedule an AST).  Moving the limit check to an AST
> would still require sched locking for setting td_flag.
>
> The cpulimit has a low resolution, so it doesn't need to be checked
> very often.  The 4BSD scheduler could easily check it every second or
> so in the main loop of the scheduler.
>
> Accesses in the CTR*()s and KASSERT()s complicate the analysis and removing
> locks -- you don't want to have to acquire a lock just to debug, and the
> debugging code also gets in the way of understanding the locking, not to
> mention the function.
>
> Bruce
>