Adding members to struct cpu_functions
Nathan Whitehorn
nwhitehorn at freebsd.org
Sun Oct 18 15:49:19 UTC 2009
Nathan Whitehorn wrote:
> Rafal Jaworowski wrote:
>>
>> On 2009-10-12, at 15:21, Nathan Whitehorn wrote:
>>
>>>>>> I was wondering whether a separate pmap module for ARMv6-7 would not
>>>>>> be the best approach. After all v6-7 should be considered an
>>>>>> entirely
>>>>>> new architecture variation, and we would avoid the very likely
>>>>>> #ifdefs
>>>>>> hell in case of a single pmap.c.
>>>>>>
>>>>>>
>>>>> Yeah, I think that would be the best solution. We could
>>>>> conditionally
>>>>> select the right pmap.c file based on the target CPU selected (just
>>>>> like we do for board variations for at91/marvell).
>>>>>
>>>>>
>>>>
>>>> pmap.c is a very large file that seems to change very often. I fear
>>>> having several versions is going to be difficult to maintain. Granted,
>>>> I haven't read the whole file line after line. Yet it seems to me its
>>>> content can be abstracted to rely on arch-specific functions that
>>>> would be found in cpufuncs instead of hardcoded macros. Is there
>>>> something fundamentally wrong with enhancing struct cpufunc in order
>>>> to let the portmeisters decide what the MMU and caching bits should
>>>> look like? This is a blocking issue for me, since it looks like the
>>>> omap has some problem with backward compatibility mode. Without fixing
>>>> up the TLBs in my initarm function, it doesn't work.
>>>>
>>>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several
>>>> cpufuncs_<myarch>.c? That would be a good way to start that
>>>> reorganization Mark has been talking about in his email.
>>>>
>>> One thing that might be worth looking at while thinking about this
>>> is how this is done on PowerPC. We have run-time selectable PMAP
>>> modules using KOBJ to handle CPUs with different MMU designs, as
>>> well as a platform module scheme, again using KOBJ, to pick the
>>> appropriate PMAP for the board as well as determine the physical
>>> memory layout and such things. One of the nice things about the
>>> approach is that it is easy to subclass if you have a new,
>>> marginally different, design, and it avoids #ifdef hell as well as
>>> letting you build a GENERIC kernel with support for multiple MMU
>>> designs and board types (the last less of a concern on ARM, though).
>>
>> What always concerned me was the performance cost this imposes, and
>> it would be a really useful exercise to measure what is the actual
>> impact of KOBJ-tized pmap we have in PowerPC; with an often-called
>> interface like pmap it might occur the penalty is not that little..
> Using the KOBJ cache means that it is only marginally more expensive
> than a standard function pointer call. There's a 9-year-old note in
> the commit log for sys/sys/kobj.h that it takes about 30% longer to
> call a function that does nothing via KOBJ versus a direct call on a
> 300 MHz P2 (a 10 ns time difference). Given that and that pmap methods
> do, in fact, do things besides get called and immediately return, I
> suspect non-KOBJ related execution time will dwarf any time loss from
> the indirection. I'll try to repeat the measurement in the next few
> days, however, since this is important to know.
> -Nathan
I just did the measurements on a 1.8 GHz PowerPC G5. There were four
tests, each repeated 1 million times. "Load and store" involves
incrementing a volatile int from 0 to 1e6 inline. "Direct calls"
involves a branch to a function that returns 0 and does nothing else.
"Function ptr" calls the same function via a pointer stored in a
register, and "KOBJ calls" calls it via KOBJ. Here are the results
(errors are +/- 0.5 ns for the function call measurements due to
compiler optimization jitter, and 0 for load and store, since that takes
a deterministic number of clock cycles):
32-bit kernel:
Load and store: 26.1 ns
Direct calls: 7.2 ns
Function ptr: 8.4 ns
KOBJ calls: 17.8 ns
64-bit kernel:
Load and store: 9.2 ns
Direct calls: 6.1 ns
Function ptr: 8.3 ns
KOBJ calls: 40.5 ns
ABI changes make a large difference, as you can see. The cost of calling
via KOBJ is non-negligible, but small, especially compared to the cost
of doing anything involving memory. I don't know how this changes with
ARM calling conventions.
-Nathan
More information about the freebsd-arm
mailing list