[Bug 216759] [kern] Memory speed with small blocks (1K) up to 35 times slower than host system (under QEMU emulation, but not only)

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Tue Feb 28 19:46:21 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=216759

--- Comment #11 from andrew at azar-a.net ---
Here are the latest results supporting my previous post:

root at debian8-test:~# sysbench --num-threads=1 --test=memory
--memory-total-size=512M --memory-block-size=1K --debug run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Debug mode enabled.


Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 512M

Memory operations type: write
Memory scope type: global
Threads started!
DEBUG: Runner thread started (0)!
Done.

Operations performed: 524288 (3877252.90 ops/sec)

512.00 MB transferred (3786.38 MB/sec)


Test execution summary:
    total time:                          0.1352s
    total number of events:              524288
    total time taken by event execution: 0.1093
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  0.14ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           524288.0000/0.00
    execution time (avg/stddev):   0.1093/0.00

DEBUG: Verbose per-thread statistics:

DEBUG:     thread #  0: min: 0.0000s  avg: 0.0000s  max: 0.0001s  events:
524288
DEBUG:                  total time taken by even execution: 0.1093s

root at debian8-test:~# cat
/sys/bus/clocksource/devices/clocksource0/current_clocksource
tsc
root at debian8-test:~# cat
/sys/bus/clocksource/devices/clocksource0/available_clocksource
tsc hpet acpi_pm




root at dev:~ # dmesg | grep -i "TSC"
Calibrating TSC clock ... TSC clock: 3400129027 Hz
 
Features=0xf83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS>
 
Features2=0xfffa3203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
TSC timecounter discards lower 1 bit(s)
Timecounter "TSC-low" frequency 1700064513 Hz quality -100
root at dev:~ # sysbench --num-threads=1 --test=memory --memory-total-size=512M
--memory-block-size=1K --debug run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Debug mode enabled.


Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 512M

Memory operations type: write
Memory scope type: global
Threads started!
DEBUG: Runner thread started (0)!
Done.

Operations performed: 524288 (73267.52 ops/sec)

512.00 MB transferred (71.55 MB/sec)


Test execution summary:
    total time:                          7.1558s
    total number of events:              524288
    total time taken by event execution: 5.2780
    per-request statistics:
         min:                                  0.01ms
         avg:                                  0.01ms
         max:                                  0.72ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           524288.0000/0.00
    execution time (avg/stddev):   5.2780/0.00

DEBUG: Verbose per-thread statistics:

DEBUG:     thread #  0: min: 0.0000s  avg: 0.0000s  max: 0.0007s  events:
524288
DEBUG:                  total time taken by even execution: 5.2780s

root at dev:~ # sysctl kern.timecounter.hardware=TSC-low
kern.timecounter.hardware: HPET -> TSC-low
root at dev:~ # sysbench --num-threads=1 --test=memory --memory-total-size=512M
--memory-block-size=1K --debug run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Debug mode enabled.


Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 512M

Memory operations type: write
Memory scope type: global
Threads started!
DEBUG: Runner thread started (0)!
Done.

Operations performed: 524288 (3408336.84 ops/sec)

512.00 MB transferred (3328.45 MB/sec)


Test execution summary:
    total time:                          0.1538s
    total number of events:              524288
    total time taken by event execution: 0.1102
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  0.24ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           524288.0000/0.00
    execution time (avg/stddev):   0.1102/0.00

DEBUG: Verbose per-thread statistics:

DEBUG:     thread #  0: min: 0.0000s  avg: 0.0000s  max: 0.0002s  events:
524288
DEBUG:                  total time taken by even execution: 0.1102s




So it's even worse than memory lag - it's gettimeofday() function lag,
basically all software relies on it one way or another....

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-virtualization mailing list