svn commit: r237433 - in head/sys: amd64/include arm/include
conf i386/include ia64/include kern mips/include pc98/include
powerpc/include sparc64/include sys x86/include x86/x86
Alexander Motin
mav at FreeBSD.org
Fri Jun 22 10:56:03 UTC 2012
On 06/22/12 13:46, Konstantin Belousov wrote:
> On Fri, Jun 22, 2012 at 01:23:42PM +0300, Konstantin Belousov wrote:
>> On Fri, Jun 22, 2012 at 11:54:28AM +0300, Alexander Motin wrote:
>>> On 22.06.2012 11:25, Konstantin Belousov wrote:
>>>> On Fri, Jun 22, 2012 at 11:08:50AM +0300, Alexander Motin wrote:
>>>>> On 06/22/12 10:06, Konstantin Belousov wrote:
>>>>>> Author: kib
>>>>>> Date: Fri Jun 22 07:06:40 2012
>>>>>> New Revision: 237433
>>>>>> URL: http://svn.freebsd.org/changeset/base/237433
>>>>>>
>>>>>> Log:
>>>>>> Implement mechanism to export some kernel timekeeping data to
>>>>>> usermode, using shared page. The structures and functions have vdso
>>>>>> prefix, to indicate the intended location of the code in some future.
>>>>>>
>>>>>> The versioned per-algorithm data is exported in the format of struct
>>>>>> vdso_timehands, which mostly repeats the content of in-kernel struct
>>>>>> timehands. Usermode reading of the structure can be lockless.
>>>>>> Compatibility export for 32bit processes on 64bit host is also
>>>>>> provided. Kernel also provides usermode with indication about
>>>>>> currently used timecounter, so that libc can fall back to syscall if
>>>>>> configured timecounter is unknown to usermode code.
>>>>>>
>>>>>> The shared data updates are initiated both from the tc_windup(), where
>>>>>> a fast task is queued to do the update, and from sysctl handlers which
>>>>>> change timecounter. A manual override switch
>>>>>> kern.timecounter.fast_gettime allows to turn off the mechanism.
>>>>>>
>>>>>> Only x86 architectures export the real algorithm data, and there, only
>>>>>> for tsc timecounter. HPET counters page could be exported as well, but
>>>>>> I prefer to not further glue the kernel and libc ABI there until
>>>>>> proper vdso-based solution is developed.
>>>>>>
>>>>>> Minimal stubs neccessary for non-x86 architectures to still compile
>>>>>> are provided.
>>>>>>
>>>>>> Discussed with: bde
>>>>>> Reviewed by: jhb
>>>>>> Tested by: flo
>>>>>> MFC after: 1 month
>>>>>
>>>>>
>>>>>> @@ -1360,6 +1367,7 @@ tc_windup(void)
>>>>>> #endif
>>>>>>
>>>>>> timehands = th;
>>>>>> + taskqueue_enqueue_fast(taskqueue_fast,&tc_windup_push_vdso_task);
>>>>>> }
>>>>>>
>>>>>> /* Report or change the active timecounter hardware. */
>>>>>
>>>>> This taskqueue_enqueue_fast() will schedule extra thread to run each
>>>>> time hardclock() fires. One thread may be not a big problem, but
>>>>> together with callout swi and possible other threads woken up there it
>>>>> will wake up several other CPU cores from sleep just to put them back in
>>>>> few microseconds. Now davide@ and me are trying to fix that by avoiding
>>>>> callout SWI use for simple tasks. Please, let's not create another
>>>>> problem same time.
>>>>
>>>> The patch was public for quite a time. If you have some comments about
>>>> it, it would be much more productive to let me know about them before
>>>> the commit, not after.
>>>
>>> I'm sorry, I haven't seen it. My mad.
>>>
>>>> Anyway, what is your proposal for 'let's not create another problem
>>>> same time' part of the message ? It was discussed, as a possibility,
>>>> to have permanent mapping for the shared page in the KVA and to perform
>>>> lock-less update of the struct vdso_timehands directly from hardclock
>>>> handler. My opinion was that amount of work added by tc_windup
>>>> eventhandler is not trivial, so it is better to be postponed to
>>>> less critical context. It also slightly more safe to not perform
>>>> lockless update for vdso_timehands, since otherwise module load which
>>>> register exec handler could cause transient gettimeofday() failure
>>>> in usermode.
>>>>
>>>> This might boil down to the fact that tc_windup function is called
>>>> too often, in fact. Also, packing execution of tc_windup eventhandler
>>>> together with the clock swi is fine from my POV.
>>>
>>> I have nothing against using shared pages. On UP system I would probably
>>> have not so much against several threads. But on SMP system it will
>>> cause at least one, but in many cases two extra CPUs to be woken up.
>>> There are two or more threads to run on hardclock(): this taskqueue,
>>> callout swi and some thread(s) woken from callouts. Scheduler has no
>>> idea how heavy they are. So it will try to move each of them to separate
>>> idle CPU. Does the amount of work done in event handlers worth extra
>>> Watts consumed by rapidly waking CPUs? As quite rare person running
>>> FreeBSD on laptop, I am not sure. I am not sure even that on
>>> desktop/server this won't kill all benefit of fast clocks by limiting
>>> TurboBoost.
>>
>> Patch below would probably work, but I cannot test it right now on real
>> hardware due to ACPI issue. It worked for me in qemu.
>>
>> commit 4f2ffd93b36d20eae61495776fc6b0855745fd7f
>> Author: Konstantin Belousov <kib at freebsd.org>
>> Date: Fri Jun 22 13:19:22 2012 +0300
>>
>> Use persistent kernel mapping of the shared page, and update the
>> vdso_timehands from hardclock, instead of scheduling task.
>
> Slightly improved version. Since tc_fill_vdso_timehands is now
> called from hardclock context, thee is no need to spin waiting for
> valid current generation of timehands.
I can't evaluate how much "hackish" this is from memory management side,
but I'm glad there is some viable solution.
Thank you!
--
Alexander Motin
More information about the svn-src-head
mailing list