One-shot-oriented event timers management

Sun Aug 29 13:10:09 UTC 2010

Hi.

I would like to present my new work on timers management code.

In my previous work I was mostly orienting on reimplementing existing
functionality in better way. The result seemed not bad, but after
looking on perspectives of using event timers in one-shot (aperiodic)
mode I've understood that implemented code complexity made it hardly
possible. So I had to significantly cut it down and rewrite from the new
approach, which is instead primarily oriented on using timers in
one-shot mode. As soon as some systems have only periodic timers I have
left that functionality, though it was slightly limited.

New management code implements two modes of operation: one-shot and
periodic. Specific mode to be used depends on hardware capabilities and
can be controlled.

In one-shot mode hardware timers programmed to generate single interrupt
precisely at the time of next wanted event. It is done by comparing
current binuptime with next scheduled times of system events
(hard-/stat-/profclock). This approach has several benefits: event timer
precision is now irrelevant for system timekeeping, hard- and statclocks
are not aliased, while only one timer used for it, and the most
important -- it allows us to define which events and when exactly we
really want to handle, without strict dependence on fixed hz, stathz,
profhz periods. Sure, our callout system is highly depends on hz value,
but now at least we can skip interrupts when we have no callouts to
handle at the time. Later we can go further.

Periodic mode now also uses alike principals of scheduling events. But
timer running in periodic mode just unable to handle arbitrary events
and as soon as event timers may not be synchronized to system
timecounter and may drift from it, causing jitter effects. So I've used
for time source of scheduling the timer events themselves. As result,
periodic timer runs on fixed frequency multiply to hz rate, while
statclock and profclock generated by dividing it respectively. (If
somebody would tell me that hardclock jitter is not really a big
problem, I would happily rip that artificial timekeeping out of there to
simplify code.) Unluckily this approach makes impossible to use two
events timers to completely separate hard- and statclocks any more, but
as I have said, this mode is required only for limited set of systems
without one-shot capable timers. Looking on my recent experience with
different platforms, it is not a big fraction.

Management code is still handles both per-CPU and global timers. Per-CPU
timers usage is obvious. Global timer is programmed to handle all CPUs
needs. In periodic mode global timer generates periodic interrupts to
some one CPU, while management code then redistributes them to CPUs that
really need it, using IPI. In one-shot mode timer is always programmed
to handle first scheduled event throughout the system. When that
interrupt arrives, it is also getting redistributed to wanting CPUs with
IPI.

To demonstrate features that could be obtained from so high flexibility
I have incorporated the idea and some parts of dynamic ticks patches of
Tsuyoshi Ozawa. Now, when some CPU goes down into C2/C3 ACPI sleep
state, that CPU stops scheduling of hard-/stat-/profclock events until
the next registered callout event. If CPU wakes up before that time by
some unrelated interrupt, missed ticks are called artificially (it is
needed now to keep realistic system stats). After system is up to date,
interrupt is handled. Now it is implemented only for ACPI systems with
C2/C3 states support, because ACPI resumes CPU with interrupts disabled,
that allows to keep up missed time before interrupt handler or some
other process (in case of unexpected task switch) may need it. As I can
see, Linux does alike things in the beginning of every interrupt handler.

I have actively tested this code for a few days on my amd64 Core2Duo
laptop and i386 Core-i5 desktop system. With C2/C3 states enabled
systems experience only about 100-150 interrupts per second, having HZ
set to 1000. These events mostly caused by several event-greedy
processes in our tree. I have traced and hacked several most aggressive
ones in this patch: http://people.freebsd.org/~mav/tm6292_idle.patch .
It allowed me to reduce down to as low as 50 interrupts per system,
including IPIs! Here is the output of `systat -vm 1` from my test
system: http://people.freebsd.org/~mav/systat_w_oneshot.txt . Obviously
that with additional tuning the results can be improved even more.

My latest patch against 9-CURRENT can be found here:
http://people.freebsd.org/~mav/timers_oneshot4.patch

Comments, ideas, propositions -- welcome!

Thanks to all who read this. ;)

-- 
Alexander Motin