Fwd: freebsd 6x slower than Raspbian 10 buster on RPi 4b

Elwood Downey elwood.downey at gmail.com
Sat Feb 13 01:39:48 UTC 2021


[Resending -- just realized this did not go to the list]

---------- Forwarded message ---------
From: Elwood Downey <elwood.downey at gmail.com>
Date: Mon, Feb 8, 2021 at 4:19 PM
Subject: Re: freebsd 6x slower than Raspbian 10 buster on RPi 4b
To: Mark Millard <marklmi at yahoo.com>


Hi Mark, many thanks for the added info. So many ways speed tests can be
misleading. Adding volatile and making sure the loops survived -O2 then I
get nearly the same everywhere now. Also interesting about c++, was not
aware it was even on here.

Cheers!

On Mon, Feb 8, 2021 at 2:42 PM Mark Millard <marklmi at yahoo.com> wrote:

>
>
> On 2021-Feb-8, at 12:12, Elwood Downey <elwood.downey at gmail.com> wrote:
>
> > Hello all!
> >
> > Just wanted to share a comparison I did between freebsd and raspbian on
> the
> > same RPi 4b with 1 GB RAM. I wrote a tiny C++ program that creates
> > pthreads, each of which mallocs an array and spins filling it with sqrtf
> of
> > the array index. Setting it to 3 threads (the hw has 4 cores), I found
> > freebsd takes consistently 6.5x wall-clock time longer than with
> raspbian.
> > Below are the sessions for each showing pertinent details. Attached is
> the
> > program itself (if it doesn't make it through the newsgroup, mail me
> direct
> > for a copy). One good news is the thread overhead for freebsd is about
> 100x
> > smaller so kudos to the scheduler.
> >
> > This is surprising and disappointing. Any comments welcome, especially
> what
> > I'm doing wrong here. Thank you for your time.
> >
> > Elwood Downey
> > Tucson AZ
> >
> >
> >
> > *Raspbian:*
> >
> > pi at hamclock:~$ uname -a
> > Linux hamclock 5.4.83-v7l+ #1379 SMP Mon Dec 14 13:11:54 GMT 2020 armv7l
> > GNU/Linux
> > pi at hamclock:~$ g++ --version
> > g++ (Raspbian 8.3.0-6+rpi1) 8.3.0
> > Copyright (C) 2018 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is
> NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
> > pi at hamclock:~$ g++ -Wall -o pthread_bench{,.cpp} -lpthread -lm
> > pi at hamclock:~$ ./pthread_bench 10000 3
> > tot thr :   4.917360
> > mean thr:   1.639120
> > tot wall:   1.726206
> > thr gain:   2.84865
> > overhead:   5.04494 %
> >
> >
> > *Freebsd:*
> >
> > [ecdowney at freebsdpi ~]$ uname -a
> > FreeBSD freebsdpi 13.0-CURRENT FreeBSD 13.0-CURRENT #0
> > main-c255641-gf2b794e1e90: Thu Jan  7 08:00:13 UTC 2021
> > root at releng1.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC
> > arm64
> > [ecdowney at freebsdpi ~]$ g++ --version
> > g++ (FreeBSD Ports Collection) 10.2.0
> > Copyright (C) 2020 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is
> NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
> > [ecdowney at freebsdpi ~]$ g++ -Wall -o pthread_bench{,.cpp} -lpthread -lm
> > [ecdowney at freebsdpi ~]$ sysctl dev.cpu.0.freq
> > dev.cpu.0.freq: 1500
> > [ecdowney at freebsdpi ~]$ ./pthread_bench 10000 3
> > tot thr :  33.810808
> > mean thr:  11.270269
> > tot wall:  11.277030
> > thr gain:    2.9982
> > overhead: 0.0599537 %
> > <pthread_bench.cpp>_______________________________________________
> > freebsd-arm at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-arm
> > To unsubscribe, send any mail to "freebsd-arm-unsubscribe at freebsd.org"
> >
>
> One issue is default optimization level vs. using a
> specific controlled level:
>
> # g++10 -Wall -o pthread_bench pthread_bench.cpp -lpthread -lm
> # ./pthread_bench 10000 3
> tot thr :  25.900658
> mean thr:   8.633552
> tot wall:   8.633356
> thr gain:   3.00007
> overhead: -0.00227026 %
>
> # g++10 -Wall -O2 -o pthread_bench pthread_bench.cpp -lpthread -lm
> # ./pthread_bench 10000 3
> tot thr :   1.133682
> mean thr:   0.377894
> tot wall:   0.376152
> thr gain:   3.01389
> overhead: -0.463111 %
>
> (I'm not certain that the gcc port and the linux have the
> same configuration for how g++10 was built or the default
> optimizations used.)
>
> # g++10 -v
> Using built-in specs.
> COLLECT_GCC=g++10
>
> COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc10/gcc/aarch64-portbld-freebsd14.0/10.2.0/lto-wrapper
> Target: aarch64-portbld-freebsd14.0
> Configured with: /wrkdirs/usr/ports/lang/gcc10/work/gcc-10.2.0/configure
> --disable-multilib --disable-bootstrap --disable-nls
> --enable-gnu-indirect-function --enable-plugin
> --libdir=/usr/local/lib/gcc10 --libexecdir=/usr/local/libexec/gcc10
> --program-suffix=10 --with-as=/usr/local/bin/as --with-gmp=/usr/local
> --with-gxx-include-dir=/usr/local/lib/gcc10/include/c++/
> --with-ld=/usr/local/bin/ld --with-pkgversion='FreeBSD Ports Collection'
> --with-system-zlib --enable-languages=c,c++,objc,fortran
> --prefix=/usr/local --localstatedir=/var --mandir=/usr/local/man
> --infodir=/usr/local/share/info/gcc10 --build=aarch64-portbld-freebsd14.0
> Thread model: posix
> Supported LTO compression algorithms: zlib
> gcc version 10.2.0 (FreeBSD Ports Collection)
>
>
> Another issue is g++ and libstdc++ vs. clang++ (system c++)
> and (system) libc++. So trying system clang and libc++:
>
> # c++ -Wall -o pthread_bench pthread_bench.cpp -lpthread -lm
> # ./pthread_bench 10000 3
> tot thr :   2.525239
> mean thr:   0.841746
> tot wall:   0.849135
> thr gain:    2.9739
> overhead:   0.87018 %
>
> # c++ -Wall -O2 -o pthread_bench pthread_bench.cpp -lpthread -lm
> # ./pthread_bench 10000 3
> tot thr :   0.000000
> mean thr:   0.000000
> tot wall:   0.000369
> thr gain:         0
> overhead:       100 %
>
> That last is because the compiler optimized run(. . .) down
> to just:
>
> 0000000000400a24 <_Z3runPv> mov x0, xzr
> 0000000000400a28 <_Z3runPv+0x4> ret
>
> The source code needs to do something to prevent
> the compiler from optimizing out currently unused
> computations.
>
> Having the compilers check more material also
> produces notices like:
>
> g++:
> pthread_bench.cpp: In function 'void* run(void*)':
> pthread_bench.cpp:18:18: warning: unused parameter 'dummy'
> [-Wunused-parameter]
>    18 | void *run (void *dummy)
>       |            ~~~~~~^~~~~
> pthread_bench.cpp: In function 'int main(int, char**)':
> pthread_bench.cpp:41:19: warning: ISO C++ forbids variable length array
> 'tid' [-Wvla]
>    41 |         pthread_t tid[n_th];
>       |                   ^~~
>
> clang++:
> pthread_bench.cpp:18:18: warning: unused parameter 'dummy'
> [-Wunused-parameter]
> void *run (void *dummy)
>                  ^
> pthread_bench.cpp:41:22: warning: variable length arrays are a C99 feature
> [-Wvla-extension]
>         pthread_t tid[n_th];
>
>                ^
>
>
> FYI: Here is a mix of using g++10 but with the FreeBSD
> system libc++ instead of gcc's libstdc++ :
>
> # g++10 -Wno-psabi -nostdinc -nostdinc++ -I/usr/include/c++/v1
> -I/usr/include -mno-outline-atomics -flto -Wall -O2 -o pthread_bench
> pthread_bench.cpp -lpthread -lm
> # ./pthread_bench 10000 3
> tot thr :   1.126198
> mean thr:   0.375399
> tot wall:   0.376237
> thr gain:   2.99332
> overhead:  0.222732 %
>
>
>
> My FreeBSD context on the RPi4B is based on non-debug
> builds of main (14-CURRENT at this point):
>
> # ~/fbsd-based-on-what-freebsd-main.sh
> merge-base: 847dfd2803f6c8b077e3ebc68e35adff2c79a65f
> merge-base: CommitDate: 2021-02-03 21:24:22 +0000
> 325d7069b027 (HEAD -> mm-src) mm-src snapshot for mm's patched build in
> git context.
> 847dfd2803f6 (freebsd/main, freebsd/HEAD, pure-src, main) readelf: do not
> trucate section name with -W
> FreeBSD RPi4B 14.0-CURRENT FreeBSD 14.0-CURRENT
> mm-src-n244624-325d7069b027 GENERIC-NODBG  arm64 aarch64 1400003 1400003
>
> It is a tailored build for cortex-a72 via -mcpu=
> use. The RPi4B's config.txt has:
>
> over_voltage=6
> arm_freq=2000
> arm_freq_min=2000
> sdram_freq_min=3200
>
> FYI:
>
> # sysctl hw.physmem
> hw.physmem: 8465969152
>
>
>
>
> ===
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)
>
>


More information about the freebsd-arm mailing list