freebsd 6x slower than Raspbian 10 buster on RPi 4b

Robert Crowston crowston at protonmail.com
Wed Feb 10 14:05:08 UTC 2021


I am surprised gcc does not eliminate this side-effect free computation, it seems almost a missed-optimization bug to me. But the Clang people have put a lot of work into finding ways to elide unnecessary malloc() calls, so perhaps that is the fruit of their labour.

By the use of variable length arrays and the style this code does not appear to be C++. I assume it is C. It is unwise to confuse these two different languages.

What is the actual use case here? This kind of contrived micro benchmark never tells you very much about realworld performance.

The Linux people have full time staff working on the Raspbian. They have access to proprietary hardware information we do not. I would expect they will always edge ahead on this platform.

But you have too many variables in play to draw the conclusion freebsd is 6 times slower.

— RHC.

On Mon, Feb 8, 2021 at 22:05, Mark Millard via freebsd-arm <freebsd-arm at freebsd.org> wrote:

> On 2021-Feb-8, at 13:51, Mark Millard <marklmi at yahoo.com> wrote:
>
>> On 2021-Feb-8, at 13:42, Mark Millard <marklmi at yahoo.com> wrote:
>>
>>> On 2021-Feb-8, at 12:12, Elwood Downey <elwood.downey at gmail.com> wrote:
>>>
>>>> Hello all!
>>>>
>>>> Just wanted to share a comparison I did between freebsd and raspbian on the
>>>> same RPi 4b with 1 GB RAM. I wrote a tiny C++ program that creates
>>>> pthreads, each of which mallocs an array and spins filling it with sqrtf of
>>>> the array index. Setting it to 3 threads (the hw has 4 cores), I found
>>>> freebsd takes consistently 6.5x wall-clock time longer than with raspbian.
>>>> Below are the sessions for each showing pertinent details. Attached is the
>>>> program itself (if it doesn't make it through the newsgroup, mail me direct
>>>> for a copy). One good news is the thread overhead for freebsd is about 100x
>>>> smaller so kudos to the scheduler.
>>>>
>>>> This is surprising and disappointing. Any comments welcome, especially what
>>>> I'm doing wrong here. Thank you for your time.
>>>>
>>>> Elwood Downey
>>>> Tucson AZ
>>>>
>>>>
>>>>
>>>> *Raspbian:*
>>>>
>>>> pi at hamclock:~$ uname -a
>>>> Linux hamclock 5.4.83-v7l+ #1379 SMP Mon Dec 14 13:11:54 GMT 2020 armv7l
>>>> GNU/Linux
>>>> pi at hamclock:~$ g++ --version
>>>> g++ (Raspbian 8.3.0-6+rpi1) 8.3.0
>>>> Copyright (C) 2018 Free Software Foundation, Inc.
>>>> This is free software; see the source for copying conditions. There is NO
>>>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>>> pi at hamclock:~$ g++ -Wall -o pthread_bench{,.cpp} -lpthread -lm
>>>> pi at hamclock:~$ ./pthread_bench 10000 3
>>>> tot thr : 4.917360
>>>> mean thr: 1.639120
>>>> tot wall: 1.726206
>>>> thr gain: 2.84865
>>>> overhead: 5.04494 %
>>>>
>>>>
>>>> *Freebsd:*
>>>>
>>>> [ecdowney at freebsdpi ~]$ uname -a
>>>> FreeBSD freebsdpi 13.0-CURRENT FreeBSD 13.0-CURRENT #0
>>>> main-c255641-gf2b794e1e90: Thu Jan 7 08:00:13 UTC 2021
>>>> root at releng1.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC
>>>> arm64
>>>> [ecdowney at freebsdpi ~]$ g++ --version
>>>> g++ (FreeBSD Ports Collection) 10.2.0
>>>> Copyright (C) 2020 Free Software Foundation, Inc.
>>>> This is free software; see the source for copying conditions. There is NO
>>>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>>> [ecdowney at freebsdpi ~]$ g++ -Wall -o pthread_bench{,.cpp} -lpthread -lm
>>>> [ecdowney at freebsdpi ~]$ sysctl dev.cpu.0.freq
>>>> dev.cpu.0.freq: 1500
>>>> [ecdowney at freebsdpi ~]$ ./pthread_bench 10000 3
>>>> tot thr : 33.810808
>>>> mean thr: 11.270269
>>>> tot wall: 11.277030
>>>> thr gain: 2.9982
>>>> overhead: 0.0599537 %
>>>> <pthread_bench.cpp>_______________________________________________
>>>> freebsd-arm at freebsd.org mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-arm
>>>> To unsubscribe, send any mail to "freebsd-arm-unsubscribe at freebsd.org"
>>>>
>>>
>>> One issue is default optimization level vs. using a
>>> specific controlled level:
>>>
>>> # g++10 -Wall -o pthread_bench pthread_bench.cpp -lpthread -lm
>>> # ./pthread_bench 10000 3
>>> tot thr : 25.900658
>>> mean thr: 8.633552
>>> tot wall: 8.633356
>>> thr gain: 3.00007
>>> overhead: -0.00227026 %
>>>
>>> # g++10 -Wall -O2 -o pthread_bench pthread_bench.cpp -lpthread -lm
>>> # ./pthread_bench 10000 3
>>> tot thr : 1.133682
>>> mean thr: 0.377894
>>> tot wall: 0.376152
>>> thr gain: 3.01389
>>> overhead: -0.463111 %
>>>
>>> (I'm not certain that the gcc port and the linux have the
>>> same configuration for how g++10 was built or the default
>>> optimizations used.)
>>>
>>> # g++10 -v
>>> Using built-in specs.
>>> COLLECT_GCC=g++10
>>> COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc10/gcc/aarch64-portbld-freebsd14.0/10.2.0/lto-wrapper
>>> Target: aarch64-portbld-freebsd14.0
>>> Configured with: /wrkdirs/usr/ports/lang/gcc10/work/gcc-10.2.0/configure --disable-multilib --disable-bootstrap --disable-nls --enable-gnu-indirect-function --enable-plugin --libdir=/usr/local/lib/gcc10 --libexecdir=/usr/local/libexec/gcc10 --program-suffix=10 --with-as=/usr/local/bin/as --with-gmp=/usr/local --with-gxx-include-dir=/usr/local/lib/gcc10/include/c++/ --with-ld=/usr/local/bin/ld --with-pkgversion='FreeBSD Ports Collection' --with-system-zlib --enable-languages=c,c++,objc,fortran --prefix=/usr/local --localstatedir=/var --mandir=/usr/local/man --infodir=/usr/local/share/info/gcc10 --build=aarch64-portbld-freebsd14.0
>>> Thread model: posix
>>> Supported LTO compression algorithms: zlib
>>> gcc version 10.2.0 (FreeBSD Ports Collection)
>>>
>>>
>>> Another issue is g++ and libstdc++ vs. clang++ (system c++)
>>> and (system) libc++. So trying system clang and libc++:
>>>
>>> # c++ -Wall -o pthread_bench pthread_bench.cpp -lpthread -lm
>>> # ./pthread_bench 10000 3
>>> tot thr : 2.525239
>>> mean thr: 0.841746
>>> tot wall: 0.849135
>>> thr gain: 2.9739
>>> overhead: 0.87018 %
>>>
>>> # c++ -Wall -O2 -o pthread_bench pthread_bench.cpp -lpthread -lm
>>> # ./pthread_bench 10000 3
>>> tot thr : 0.000000
>>> mean thr: 0.000000
>>> tot wall: 0.000369
>>> thr gain: 0
>>> overhead: 100 %
>>>
>>> That last is because the compiler optimized run(. . .) down
>>> to just:
>>>
>>> 0000000000400a24 <_Z3runPv> mov x0, xzr
>>> 0000000000400a28 <_Z3runPv+0x4> ret
>>>
>>> The source code needs to do something to prevent
>>> the compiler from optimizing out currently unused
>>> computations.
>>>
>>> Having the compilers check more material also
>>> produces notices like:
>>>
>>> g++:
>>> pthread_bench.cpp: In function 'void* run(void*)':
>>> pthread_bench.cpp:18:18: warning: unused parameter 'dummy' [-Wunused-parameter]
>>> 18 | void *run (void *dummy)
>>> | ~~~~~~^~~~~
>>> pthread_bench.cpp: In function 'int main(int, char**)':
>>> pthread_bench.cpp:41:19: warning: ISO C++ forbids variable length array 'tid' [-Wvla]
>>> 41 | pthread_t tid[n_th];
>>> | ^~~
>>>
>>> clang++:
>>> pthread_bench.cpp:18:18: warning: unused parameter 'dummy' [-Wunused-parameter]
>>> void *run (void *dummy)
>>> ^
>>> pthread_bench.cpp:41:22: warning: variable length arrays are a C99 feature [-Wvla-extension]
>>> pthread_t tid[n_th];
>>>
>>> ^
>>>
>>>
>>> FYI: Here is a mix of using g++10 but with the FreeBSD
>>> system libc++ instead of gcc's libstdc++ :
>>>
>>> . . .
>
> I messed that up: it was still using libstdc++ when I
> checked with ldd. Trying again, with the required linker
> related command line options used as well:
>
> # g++10 -Wno-psabi -nostdinc -nostdinc++ -I/usr/include/c++/v1 -I/usr/include -mno-outline-atomics -nodefaultlibs -lc++ -lcxxrt -lthr -lm -lc -lgcc_s -Wl,-rpath=/usr/local/lib/gcc10 -flto -Wall -O2 -o pthread_bench pthread_bench.cpp -lpthread -lm
> # ldd pthread_benchpthread_bench:
> libc++.so.1 => /usr/lib/libc++.so.1 (0x4047f000)
> libcxxrt.so.1 => /lib/libcxxrt.so.1 (0x40579000)
> libthr.so.3 => /lib/libthr.so.3 (0x405c8000)
> libm.so.5 => /lib/libm.so.5 (0x40624000)
> libc.so.7 => /lib/libc.so.7 (0x40690000)
> libgcc_s.so.1 => /usr/local/lib/gcc10/libgcc_s.so.1 (0x40aae000)
> # ./pthread_bench 10000 3
> tot thr : 1.131916
> mean thr: 0.377305
> tot wall: 0.376065
> thr gain: 3.00989
> overhead: -0.32973 %
>
>>> My FreeBSD context on the RPi4B is based on non-debug
>>> builds of main (14-CURRENT at this point):
>>>
>>> # ~/fbsd-based-on-what-freebsd-main.sh
>>> merge-base: 847dfd2803f6c8b077e3ebc68e35adff2c79a65f
>>> merge-base: CommitDate: 2021-02-03 21:24:22 +0000
>>> 325d7069b027 (HEAD -> mm-src) mm-src snapshot for mm's patched build in git context.
>>> 847dfd2803f6 (freebsd/main, freebsd/HEAD, pure-src, main) readelf: do not trucate section name with -W
>>> FreeBSD RPi4B 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n244624-325d7069b027 GENERIC-NODBG arm64 aarch64 1400003 1400003
>>>
>>> It is a tailored build for cortex-a72 via -mcpu=
>>> use. The RPi4B's config.txt has:
>>>
>>> over_voltage=6
>>> arm_freq=2000
>>> arm_freq_min=2000
>>> sdram_freq_min=3200
>>>
>>> FYI:
>>>
>>> # sysctl hw.physmem
>>> hw.physmem: 8465969152
>>>
>>
>> I forgot to note that the RPi4B has heatsinks and
>> a fan and has a good 5.1A 3.5A power supply.
>
> Trying again: 5.1V 3.5A
>
> ===
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)
>
> _______________________________________________
> freebsd-arm at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm-unsubscribe at freebsd.org"


More information about the freebsd-arm mailing list