[GSOC] bhyve instruction caching
Mihai Carabas
mihai.carabas at gmail.com
Fri Aug 1 15:42:38 UTC 2014
Hi,
Until now I managed to finish up all the coding stuff related to
instruction caching. As you saw in my previous e-mails we obtained a
speed-up of 35%-40% in the microbenchmarking tests (accessing LAPIC
many times from a kernel module). Further we wanted to see how get
this extrapolated to real-world workloads.
I've made two kinds of benchmarking: a CPU intensive process and a
make buildworld -j2 command. For each of one I've measured the time
spent to execute.
1) The CPU intensive app is a bash script:
#!/usr/local/bin/bash
a=0
MAX=10000000
for i in $(seq 1 $MAX);
do
a=$((a+1))
done
For a VM with 2 vCPUs:
*Cache_instr=1
real 3m45.067s 3m42.628s 3m38.371s 3m36.301s 3m39.929s
user 3m10.454s 3m8.785s 3m7.516s 3m8.204s 3m8.822s
sys 0m19.085s 0m16.135s 0m13.696s 0m13.016s 0m16.105s
* Cache_instru=0
real 3m50.550s 3m41.517s 3m34.783s
user 3m5.350s 3m7.571s 3m1.415s
sys 0m25.268s 0m19.200s 0m16.200s
There are multiple measurements. As you can see the results aren't
stable and are in the same range. To minimize the range they vary, I
repeated the tests with 1vCPU (to eliminate the context switches):
With 1vCPU:
* Cache_instr=1
real 2m58.968s 2m57.009s 3m0.451s 2m55.902s 2m56.422s
user 2m46.909s 2m45.241s 2m45.670s 2m45.788s 2m45.503s
sys 0m4.890s 0m4.134s 0m3.942s 0m3.764s 0m3.984s
* Cache_instr=0
real 2m56.845s 2m57.051s 3m1.794s 2m57.340s
user 2m45.232s 2m44.873s 2m45.482s 2m46.538s
sys 0m4.644s 0m4.141s 0m3.906s 0m3.875s
As you can see the results are very appropiate in terms of variation
and almost the same.
2) For a make buildworld -j2 with 1 vCPU:
Cache_instr=1
13900.60 real 12051.54 user 1800.42 sys
Cache_instr=0
13938.07 real 12122.14 user 1743.61 sys
As you can see the difference between them is not significant and is
about the same.
As you can see for this two different kind of workloads there is no
speed-up improvement unfortunatelly.
I've tried other workloads more speific like:
a) dd if=/dev/zero of=/dev/zero bs=256 count=10000K (from memory to
memory - to not be influenced by the storage system)
b) A simple getuid program that executes getuid syscall in a loop:
int main(int argc, char *argv[])
{
int i;
if (argc == 2) {
i = atoi(argv[1]);
} else {
i = 100;
}
while (i > 0) {
getuid();
i--;
}
return 0;
}
But the results were the same.
I spoke with Neel and it seems that we can't get a real-world benefict
with this instruction caching.
Thanks,
Mihai
More information about the soc-status
mailing list