Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)

Bruce Evans brde at optusnet.com.au
Fri Jun 8 09:16:24 UTC 2012


On Thu, 7 Jun 2012, Konstantin Belousov wrote:

> On Thu, Jun 07, 2012 at 10:26:10AM +0200, Dag-Erling Sm??rgrav wrote:
>> Bruce Evans <brde at optusnet.com.au> writes:
>>> Now 2.44 nsec/call makes sense, but you really should add some volatiles
>>> here to ensure that getpid() is not optimized away.
>>
>> As you can see from the disassembly I provided, it isn't.
>>
>>> SO it loops OK, but we can't see what getpid() does.  It must not be
>>> doing much.
>>
>> Umm, yes, that's the whole point of this conversation.  Linux's getpid()
>> is not a syscall, but a library function that returns a constant from a
>> page shared by the kernel.

Of course, but were down to nearly single-cycle times, so the difference
between the libary function using 1 or 2 instructions to load the value
may be significant.

>>> 5.4104 nsec/call for gettimeofday() is impossible if there is any
>>> rdtsc() hardware call or much layering.
>>
>> It's gettimeofday(0, 0), actually, so it doesn't need to read the clock.
>> If I pass a struct timeval as the first argument - so it *does* need to
>> read the clock - it's a little bit slower but still faster than an
>> actual system call.  Here's another run that demonstrates this - a
>> little bit slower than previous runs because I have other processes
>> running:
>>
>> getpid(): 10,000,000 iterations in 30,377 us
>> gettimeofday(0, 0): 10,000,000 iterations in 55,571 us
>> gettimeofday(&tv, 0): 10,000,000 iterations in 302,634 us
> So this timing seems to be approximately same by the order of magnitude
> as the times I get for the patch, around 25 vs. 30ns/per gettimeofday()
> call.

Not great.  I get 6.97 nsec for a slightly reduced version of FreeBSD's
1998 version of microtime(), which was written in i386 asm.  (This depends
on rdtsc taking only 6.5 cycles = 3.25 nsec on the test CPU (Athlon64)).
>From rev.1.40 of microtime.s:

% #include <machine/asm.h>
% 
% ENTRY(microtime)
% 	movl	tsc_freq, %ecx
% 	testl	%ecx, %ecx
% 	je	i8254_microtime

This branch is predicted perfectly but costs 0.24 nsec (0.5 cycles).

% 	rdtsc
% 	subl	tsc_bias, %eax
% 	mull	tsc_multiplier
% 	movl	%edx, %eax
% 	addl	timeoff+4, %eax	/* usec += time.tv_sec */
% 	movl	timeoff, %edx	/* sec = time.tv_sec */

Similar to binuptime().  To convert from the old microtime.s, I just
converted the variable names from aout to elf (and supplied dummy
variables), and removed locking instructions, which were pushfl/cli/popfl).

% 
% 	cmpl	$1000000, %eax	/* usec valid? */
% 	jb	1f
% 	subl	$1000000, %eax	/* adjust usec */
% 	incl	%edx		/* bump sec */

Probably faster with bintimes (can be branch-free then (?)), but by
converting directly to the final format we avoid a scaling step.  The
branch in it is predicted too perfectly by my dummy variables.

% 1:
% 	movl	4(%esp), %ecx	/* load timeval pointer arg */
% 	movl	%edx, (%ecx)	/* tvp->tv_sec = sec */
% 	movl	%eax, 4(%ecx)	/* tvp->tv_usec = usec */
% 
% 	ret
% 
% i8254_microtime:
% 	ret			/* XXX garbage */

>
> Linux seems slower probably due to slower CPU ? Mine is 3.4Ghz, while
> des used 3.1Ghz for Linux box.

If it is a different CPU model, the the speed of rdtsc can vary a lot.

Bruce


More information about the freebsd-arch mailing list