Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
Bruce Evans
brde at optusnet.com.au
Fri Jun 8 09:16:24 UTC 2012
On Thu, 7 Jun 2012, Konstantin Belousov wrote:
> On Thu, Jun 07, 2012 at 10:26:10AM +0200, Dag-Erling Sm??rgrav wrote:
>> Bruce Evans <brde at optusnet.com.au> writes:
>>> Now 2.44 nsec/call makes sense, but you really should add some volatiles
>>> here to ensure that getpid() is not optimized away.
>>
>> As you can see from the disassembly I provided, it isn't.
>>
>>> SO it loops OK, but we can't see what getpid() does. It must not be
>>> doing much.
>>
>> Umm, yes, that's the whole point of this conversation. Linux's getpid()
>> is not a syscall, but a library function that returns a constant from a
>> page shared by the kernel.
Of course, but were down to nearly single-cycle times, so the difference
between the libary function using 1 or 2 instructions to load the value
may be significant.
>>> 5.4104 nsec/call for gettimeofday() is impossible if there is any
>>> rdtsc() hardware call or much layering.
>>
>> It's gettimeofday(0, 0), actually, so it doesn't need to read the clock.
>> If I pass a struct timeval as the first argument - so it *does* need to
>> read the clock - it's a little bit slower but still faster than an
>> actual system call. Here's another run that demonstrates this - a
>> little bit slower than previous runs because I have other processes
>> running:
>>
>> getpid(): 10,000,000 iterations in 30,377 us
>> gettimeofday(0, 0): 10,000,000 iterations in 55,571 us
>> gettimeofday(&tv, 0): 10,000,000 iterations in 302,634 us
> So this timing seems to be approximately same by the order of magnitude
> as the times I get for the patch, around 25 vs. 30ns/per gettimeofday()
> call.
Not great. I get 6.97 nsec for a slightly reduced version of FreeBSD's
1998 version of microtime(), which was written in i386 asm. (This depends
on rdtsc taking only 6.5 cycles = 3.25 nsec on the test CPU (Athlon64)).
>From rev.1.40 of microtime.s:
% #include <machine/asm.h>
%
% ENTRY(microtime)
% movl tsc_freq, %ecx
% testl %ecx, %ecx
% je i8254_microtime
This branch is predicted perfectly but costs 0.24 nsec (0.5 cycles).
% rdtsc
% subl tsc_bias, %eax
% mull tsc_multiplier
% movl %edx, %eax
% addl timeoff+4, %eax /* usec += time.tv_sec */
% movl timeoff, %edx /* sec = time.tv_sec */
Similar to binuptime(). To convert from the old microtime.s, I just
converted the variable names from aout to elf (and supplied dummy
variables), and removed locking instructions, which were pushfl/cli/popfl).
%
% cmpl $1000000, %eax /* usec valid? */
% jb 1f
% subl $1000000, %eax /* adjust usec */
% incl %edx /* bump sec */
Probably faster with bintimes (can be branch-free then (?)), but by
converting directly to the final format we avoid a scaling step. The
branch in it is predicted too perfectly by my dummy variables.
% 1:
% movl 4(%esp), %ecx /* load timeval pointer arg */
% movl %edx, (%ecx) /* tvp->tv_sec = sec */
% movl %eax, 4(%ecx) /* tvp->tv_usec = usec */
%
% ret
%
% i8254_microtime:
% ret /* XXX garbage */
>
> Linux seems slower probably due to slower CPU ? Mine is 3.4Ghz, while
> des used 3.1Ghz for Linux box.
If it is a different CPU model, the the speed of rdtsc can vary a lot.
Bruce
More information about the freebsd-arch
mailing list