Bheve: Slow linux syscalls on AMD

Willem Jan Withagen wjw at digiware.nl
Fri May 30 16:44:42 UTC 2014


Hi,

In my quest to find why Linux is so slow on AMD I executed

strace -T -c ls -aslR / > /dev/null

On both a Linux running on a real CPU (just a mere Celeron):
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  25.26    0.235827           1    180706           lstat64
  23.06    0.215235           1    255949    255942 getxattr
  17.65    0.164726           6     29720           getdents64
  13.41    0.125168           1    180698           stat64
  12.75    0.119004           1    130339    130339 lgetxattr
   4.28    0.039929           2     22656        86 readlink
   2.08    0.019399           1     14859           openat
   0.70    0.006569           0     14910           close
   0.54    0.005085           0     14909           fstat64
   0.15    0.001417           0      3135           write
   0.10    0.000896           0      2831           clock_gettime
   0.03    0.000263           3        83        35 open
   0.00    0.000000           0        18           read
   0.00    0.000000           0         1           execve
   0.00    0.000000           0        12        12 access
   0.00    0.000000           0         8           brk
   0.00    0.000000           0         3         3 ioctl
   0.00    0.000000           0        35           munmap
   0.00    0.000000           0        12           mprotect
   0.00    0.000000           0        57           _llseek
   0.00    0.000000           0        67           mmap2
   0.00    0.000000           0         1           set_thread_area
   0.00    0.000000           0         2         2 statfs64
   0.00    0.000000           0         4           socket
   0.00    0.000000           0         4         4 connect
------ ----------- ----------- --------- --------- ----------------
100.00    0.933518                851019    386423 total



And on a virtualized Linux:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  55.83   30.198355         666     45331           getdents
  17.99    9.733976         429     22665         5 openat
  16.62    8.988507         396     22674           close
   9.10    4.920301         217     22673           fstat
   0.37    0.201331         223       901           write
   0.03    0.015656         447        35        23 open
   0.02    0.008853         328        27           mmap
   0.02    0.008239         358        23           brk
   0.01    0.007008         501        14           mprotect
   0.01    0.003540         354        10           read
   0.01    0.003146         393         8         8 access
   0.00    0.001808         452         4           munmap
   0.00    0.000660         660         1           mremap
   0.00    0.000615         308         2         2 statfs
   0.00    0.000582         194         3         3 ioctl
   0.00    0.000254         254         1           execve
   0.00    0.000236         236         1           stat
   0.00    0.000193         193         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00   54.093260                114374        41 total

One cannot really compare the factual results, but what is significat is 
the orders of magnitude difference in syscall time.

So I ripped this from the net:
Which turns out to be a BAD test, since linux caches the getpid value.

====
#include <stdio.h>
#include <sys/time.h>
#include <unistd.h>
#include <assert.h>

int foo(){
   return(10);
}

long nanosec(struct timeval t){ /* Calculate nanoseconds in a timeval 
structure */
   return((t.tv_sec*1000000+t.tv_usec)*1000);
}

main(){
   int i,j,res;
   long N_iterations=1000000; /* A million iterations */
   float avgTimeSysCall, avgTimeFuncCall;
   struct timeval t1, t2;

   /* Find average time for System call */
   res=gettimeofday(&t1,NULL); assert(res==0);
   for (i=0;i<N_iterations; i++){
     j=getpid();
   }
   res=gettimeofday(&t2,NULL);   assert(res==0);
   avgTimeSysCall = (nanosec(t2) - nanosec(t1))/(N_iterations*1.0);

   /* Find average time for Function call */
   res=gettimeofday(&t1,NULL);  assert(res==0);
   for (i=0;i<N_iterations; i++){
     j=foo();
   }
   res=gettimeofday(&t2,NULL);   assert(res==0);
   avgTimeFuncCall = (nanosec(t2) - nanosec(t1))/(N_iterations*1.0);


   printf("Average time for System call getpid : %f\n",avgTimeSysCall);
   printf("Average time for Function call : %f\n",avgTimeFuncCall);
}
====

Which gives on the real processor:
root at ubuntu-i386-14:/home/wjw/src# strace -c ./tests/getpid2
Average time for System call getpid : 17.775999
Average time for Function call : 13.661000
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
   0.00    0.000000           0         1           read
   0.00    0.000000           0         2           write
   0.00    0.000000           0         2           open
   0.00    0.000000           0         2           close
   0.00    0.000000           0         1           execve
   0.00    0.000000           0         1           getpid
   0.00    0.000000           0         3         3 access
   0.00    0.000000           0         1           brk
   0.00    0.000000           0         4           gettimeofday
   0.00    0.000000           0         1           munmap
   0.00    0.000000           0         3           mprotect
   0.00    0.000000           0         7           mmap2
   0.00    0.000000           0         3           fstat64
   0.00    0.000000           0         1           set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                    32         3 total


And on the virtualised one:
root at ubuntu-14:~/src# tests/getpid2
Average time for System call getpid : 1883.415039
Average time for Function call : 1032.593018

And the VM version does take some time to return the prompt.
So it is seems not to be a wall-clock time issue.
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  22.80    0.001528         191         8           mmap
  17.77    0.001191         298         4           mprotect
  16.68    0.001118         559         2           write
  14.34    0.000961         320         3         3 access
   9.01    0.000604         604         1           munmap
   7.46    0.000500         250         2           open
   4.19    0.000281          94         3           fstat
   1.66    0.000111         111         1           execve
   1.52    0.000102         102         1           read
   1.33    0.000089          45         2           close
   1.27    0.000085          85         1           brk
   1.27    0.000085          85         1           getpid
   0.70    0.000047          47         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.006702                    30         3 total

But even with this bad test, it is clear that the overhead and slowness 
in syscalls is killing the performance....

So:
1) I'm looking for a better basic syscall in Linux that is not cache, 
faked or otherwise tweaked to nog give what I want.
Would really be nice if there was a NOP_syscall, just go in and out of 
kernel space.....

2) I'm looking for an explanation (or better yet) a fix for the slow 
systemcalls....

3) Can somebody do the same test on an intel plaform and see what the 
results are.

Thanx,
--WjW


More information about the freebsd-virtualization mailing list