RFC: New event timers infrastructure

Brandon Gooch jamesbrandongooch at gmail.com
Sat Jun 19 23:47:47 UTC 2010


2010/6/6 Alexander Motin <mav at freebsd.org>:
> Hi.
>
> Most of x86 systems now has at least 4 types of event timers: i8254,
> RTC, LAPIC and HPET. Respective code in kernel is very tangled, heavily
> hardcoded and absolutely not scalable. I have reimplemented it, trying
> to solve these issues.
>
> I did such things:
>  - created unified timer driver's API (sys/timeet.h, kernel/kern_et.c).
> It supports global and per-CPU timers, periodic and one-shot. Provides
> driver and consumer interfaces for choosing timers and operating them;
>  - cleaned existing x86 event timer driver's code and modified it for
> new API (x86/isa/atrtc.c, x86/isa/clock.c, x86/x86/local_apic.c). LAPIC
> timer is now per-CPU and supports both periodic and one-shot modes;
>  - extended HPET driver to support it's event timers in periodic and
> one-shot mode (dev/acpica/acpi_hpet.c). Support for per-CPU operation
> and FSB interrupts planned for later;
>  - written mostly machine-independent mid-layer for managing any present
> timers to provide clocks needed for kernel (x86/x86/timeevents.c). It
> supports both global and per-CPU timers. Now it supports only periodic
> mode, but one-shot mode support planned for later.
>
> All this stuff deeply configurable via both loader tunables on boot and
> sysctls in real time:
>
> %sysctl kern.eventtimer
> kern.eventtimer.choice: LAPIC(500) HPET(400) HPET1(390) HPET2(390)
> i8254(100) RTC(0)
> kern.eventtimer.et.LAPIC.flags: 7
> kern.eventtimer.et.LAPIC.frequency: 99752386
> kern.eventtimer.et.LAPIC.quality: 500
> kern.eventtimer.et.HPET.flags: 3
> kern.eventtimer.et.HPET.frequency: 14318180
> kern.eventtimer.et.HPET.quality: 400
> kern.eventtimer.et.HPET1.flags: 3
> kern.eventtimer.et.HPET1.frequency: 14318180
> kern.eventtimer.et.HPET1.quality: 390
> kern.eventtimer.et.HPET2.flags: 3
> kern.eventtimer.et.HPET2.frequency: 14318180
> kern.eventtimer.et.HPET2.quality: 390
> kern.eventtimer.et.RTC.flags: 1
> kern.eventtimer.et.RTC.frequency: 32768
> kern.eventtimer.et.RTC.quality: 0
> kern.eventtimer.et.i8254.flags: 1
> kern.eventtimer.et.i8254.frequency: 1193182
> kern.eventtimer.et.i8254.quality: 100
> kern.eventtimer.timer2: NONE
> kern.eventtimer.timer1: i8254
> kern.eventtimer.singlemul: 2
>
> By default system chooses two timers with highest "quality" for
> hardclock and statclock/profclock. User may affect that choice via
> disabling unwanted drivers and/or via direct specification of wanted
> ones. It is possible to change timers on-flight via sysctls:
>
> %sysctl kern.eventtimer.timer1=hpet
> kern.eventtimer.timer1: i8254 -> HPET
> %sysctl kern.eventtimer.timer2=hpet1
> kern.eventtimer.timer2: NONE -> HPET1
>
> After every timer change, if two timers available, mid-layer
> cross-checks them, and if one of them is not functional - replaces it.
>
> If there is no second timer available, or user specified to not use it -
> mid-layer automatically increases rate of the first timer and divide
> it's frequency to satisfy system needs as good as possible. User may
> specify how fast he wish to run fist timer relative to hz by setting
> kern.eventtimer.singlemul tunable/sysctl.
>
> When profiling is active, mid-layer automatically rises respective timer
> frequency to about 8KHz (was 1KHz previously) and decreases it back on
> profiling end.
>
> All above was tested on i386 and amd64. XEN was not affected and builds
> fine. pc98 was slightly touched. It wasn't tested, but builds fine. It's
> pc98/cbus/clock.c needs respective rewrite to use new features. Other
> architectures are untouched, but if any of them may benefit from this
> functionality - it should be possible to share most of the code.
>
> Latest patches can be found here:
> http://people.freebsd.org/~mav/et.20100606.patch
>
> Known issues:
>  - i8254 timer generates 18Hz interrupt rate when not used and not
> disabled. I haven't found a way to disable it's interrupt source while
> holding spinlock.
>  - timer drivers code will need some more cleaning after interrupt
> handler will be able to return both argument and frame same time.
>
> Feedback is very appreciated.

I've been testing these patches since the first iteration
(et.20100606), and I haven't discovered any related issues.

I'm not able to perform a suspend/resume cycle, but in all fairness,
this machine has never been fully functional in this regard. I will
look into the issue further if time allows...

It seems to me that a more widespread review and test of the code by
knowledgeable, skilled developers is warranted at this stage; as I
understand it, these patches comprise the initial stages necessary to
provide the so-called "tickless" kernel functionality which seems to
be all the rage in today's virtualized environments (at least
according to VMware, VirtualBox, etc...).

So, big thanks to Alexander for bravely undertaking this task and
seemingly making great progress!

Also, Alexander, I've attached two dmesg outputs, one with your patch
(r209354, applied and built today patched with
http://people.freebsd.org/~mav/et.20100618.patch) and another without,
r209256 from Thursday, June 17.

I am unclear about the number of interrupts I should expect from the
hpet0 device (compared to the 99 from the rtc at 100Hz), so here is
the output of vmstat -i with and without the "et" patches:

With "et" patches:

interrupt                          total       rate
irq1: atkbd0                         369          3
irq9: acpi0                          961          8
irq12: psm0                         1002          9
irq18: uhci5                         140          1
irq19: uhci2 ehci0*                 4823         45
irq20: hpet0                       23893        223
irq23: uhci3 ehci1                    11          0
irq256: vgapci0                     1031          9
irq257: hdac0                         14          0
irq258: iwn0                        4258         39
irq259: bge0                           1          0
Total                              36503        341

Without "et" patches:

interrupt                          total       rate
irq1: atkbd0                         449          2
irq0: clk                          17334         99
irq9: acpi0                         1701          9
irq12: psm0                         8784         50
irq18: uhci5                         188          1
irq19: uhci2 ehci0*                 5828         33
irq23: uhci3 ehci1                    11          0
irq256: vgapci0                     1896         10
irq257: hdac0                         14          0
irq258: iwn0                       29571        169
irq259: bge0                           1          0
Total                              65777        378

And lastly, the values of the kern.eventtimer sysctls:

$ sysctl kern.eventtimer
kern.eventtimer.choice: HPET(450) HPET1(440) HPET2(440) HPET3(440) i8254(100)
kern.eventtimer.et.HPET.flags: 3
kern.eventtimer.et.HPET.frequency: 14318180
kern.eventtimer.et.HPET.quality: 450
kern.eventtimer.et.HPET1.flags: 3
kern.eventtimer.et.HPET1.frequency: 14318180
kern.eventtimer.et.HPET1.quality: 440
kern.eventtimer.et.HPET2.flags: 3
kern.eventtimer.et.HPET2.frequency: 14318180
kern.eventtimer.et.HPET2.quality: 440
kern.eventtimer.et.HPET3.flags: 3
kern.eventtimer.et.HPET3.frequency: 14318180
kern.eventtimer.et.HPET3.quality: 440
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.timer2: HPET1
kern.eventtimer.timer1: HPET
kern.eventtimer.singlemul: 4

Is there anything else one should provide (I know I've asked this
before, but I can't recall if you stated anything "officially")?

Thanks again,

-Brandon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.boot.et.20100618
Type: application/octet-stream
Size: 62083 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20100619/9c90a8ce/dmesg.boot.et-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.boot.no_et
Type: application/octet-stream
Size: 62515 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20100619/9c90a8ce/dmesg.boot-0001.obj


More information about the freebsd-current mailing list