[RFC/RFT] calloutng
Alexander Motin
mav at FreeBSD.org
Mon Dec 31 15:41:55 UTC 2012
On 31.12.2012 17:02, Ian Lepore wrote:
> On Mon, 2012-12-31 at 12:17 +0200, Alexander Motin wrote:
>> On 31.12.2012 08:17, Luigi Rizzo wrote:
>>> On Sun, Dec 30, 2012 at 04:13:43PM -0700, Ian Lepore wrote:
>>> ...
>>>> I grabbed testsleep.c to test an arm event timer implementation, and had
>>>> to fix a couple nits... kqueueto was missing from the names[] array, and
>>>> I had to add a "* 1000" to a couple places where usec was stuffed into a
>>>> timespec's tv_nsec.
>>>>
>>>> I also tested the calloutng_12_17 patches and the kqueue stuff behaved
>>>> very strangely.
>>
>> I've rewritten kqueue timeouts at the calloutng_12_26.patch.
>>
>>>> Then I noticed you had a 12_26 patchset so I tested
>>>> that (after crudely fixing a couple uninitialized var warnings), and it
>>>> all looks good on this arm (Raspberry Pi). I'll attach the results.
>>>>
>>>> It's so sweet to be able to do precision sleeps.
>>
>> Thank you for testing, Ian.
>>
>>> interesting numbers, but there seems to be some problem in computing
>>> the exact interval; delays are much larger than expected.
>>>
>>> In this test, the original timer code used to round to the next multiple
>>> of 1 tick and then add another tick (except for the kqueue case),
>>> which is exactly what you see in the second set of measurements.
>>>
>>> The calloutng code however seems to do something odd:
>>> in addition to fixed overhead (some 50us, which you can see in
>>> the tests for 1us and 300us), all delay seem to be ~10% larger
>>> than what is requested, upper bounded to 10ms (note, the
>>> numbers are averages so i cannot tell whether all samples are
>>> the same or there is some distribution of values).
>>>
>>> I am not sure if this error is peculiar of the ARM version or also
>>> appears on x86/amd64 but I believe it should be fixed.
>>>
>>> If you look at the results below:
>>>
>>> 1us possily ok:
>>> for very short intervals i would expect some kind
>>> of 'reschedule' without actually firing a timer; maybe
>>> 50us are what it takes to do a round through the scheduler ?
>>>
>>> 300us probably ok
>>> i guess the extra 50-90us are what it takes to do a round
>>> through the scheduler
>>>
>>> 1000us borderline (this is the case for poll and kqueue, which are
>>> rounded to 1ms)
>>> here intervals seem to be increased by 10%, and i cannot see
>>> a good reason for this (more below).
>>>
>>> 3000us and above: wrong
>>> here again, the intervals seem to be 10% larger than what is
>>> requested, perhaps limiting the error to 10-20ms.
>>>
>>>
>>> Maybe the 10% extension results from creating a default 'precision'
>>> for legacy calls, but i do not think this is done correctly.
>>>
>>> First of all, if users do not specify a precision themselves, the
>>> automatically generated value should never exceed one tick.
>>>
>>> Second, the only point of a 'precision' parameter is to merge
>>> requests that may be close in time, so if there is already a
>>> timer scheduled within [Treq, Treq+precision] i will get it;
>>> but if there no pending timer, then one should schedule it
>>> for the requested interval.
>>>
>>> Davide/Alexander, any ideas ?
>>
>> All mentioned effects could be explained with implemented logic. 50us at
>> 1us is probably sum of minimal latency of the hardware eventtimer on the
>> specific platform and some software processing overhead (syscall,
>> callout, timecouters, scheduler, etc). At later points system starts to
>> noticeably use precision specified by kern.timecounter.alloweddeviation
>> sysctl. It affects results from two sides: 1) extending intervals for
>> specified percent of time to allow event aggregation, and 2) choosing
>> time base between fast getbinuptime() and precise binuptime(). Extending
>> interval is needed to aggregate not only callouts with each other, but
>> also callouts with other system events, which are impossible to schedule
>> in advance. It gives specified relative error, but no more then one CPU
>> wakeup period in absolute: for busy CPU (not skipping hardclock() ticks)
>> it is 1/hz, for completely idle one it can be up to 0.5s. Second point
>> allows to reduce processing overhead by the cost of error up to 1/hz for
>> long periods (>(100/allowed)*(1/hz)), when it is used.
>>
>> To get best possible precision kern.timecounter.alloweddeviation sysctl
>> can be set to smaller value. Setting it to 0 will effectively disable
>> all optimizations, but should give 50us precision in all cases.
>>
>>>> for t in 1 300 3000 30000 300000 ; do
>>>> for m in select poll usleep nanosleep kqueue kqueueto syscall ; do
>>>> ./testsleep $t $m
>>>> done
>>>> done
>>>>
>>>> [test results snipped]
>>
>
> I should have posted some information about the test platform... It's a
> single-processor 700mhz arm chip. There was essentially nothing else
> running during the tests other than mostly-idle daemons (sshd, ntpd, the
> usual suspects). Kernel debugging options off (INVARIANTS[_SUPPORT],
> DIAGNOSTIC, and WITNESS).
>
> Some sysctl values of interest...
>
> rpi# sysctl kern.timecounter
> kern.timecounter.fast_gettime: 1
> kern.timecounter.tick: 1
> kern.timecounter.choice: BCM2835 Timecounter(1000) dummy(-1000000)
> kern.timecounter.hardware: BCM2835 Timecounter
> kern.timecounter.alloweddeviation: 5
> kern.timecounter.stepwarnings: 1
> kern.timecounter.tc.BCM2835 Timecounter.mask: 4294967295
> kern.timecounter.tc.BCM2835 Timecounter.counter: 734706756
> kern.timecounter.tc.BCM2835 Timecounter.frequency: 1000000
> kern.timecounter.tc.BCM2835 Timecounter.quality: 1000
> rpi# sysctl kern.eventtimer
> kern.eventtimer.choice: BCM2835 Event Timer 3(1000)
> kern.eventtimer.et.BCM2835 Event Timer 3.flags: 2
> kern.eventtimer.et.BCM2835 Event Timer 3.frequency: 1000000
> kern.eventtimer.et.BCM2835 Event Timer 3.quality: 1000
> kern.eventtimer.periodic: 0
> kern.eventtimer.timer: BCM2835 Event Timer 3
> kern.eventtimer.activetick: 1
> kern.eventtimer.idletick: 0
> kern.eventtimer.singlemul: 4
>
> BTW, is there any advantage to implementing periodic mode for an
> eventtimer? It would be easy enough to do for this hardware, but it
> looks like all this new event timer code is pretty much a stake through
> the heart of periodic timer ticks.
Periodic-mode-only hardware is still supported, but present code takes
almost no advantage from periodic mode if one-shot mode is supported. It
can't use interrupts as time source to run events (as legacy code did)
because of possible drift from system timecounter that makes impossible
to specify absolute event time. The only benefit is that timer hardware
is not reprogrammed each time, and I don't think that this economy worth
result. But for all hardware supporting periodic mode I've implemented
respective support at least for completeness and testing purposes.
--
Alexander Motin
More information about the freebsd-current
mailing list