RPi2 i/o blocking and SD card performance

Warner Losh wlosh at bsdimp.com
Thu Jun 9 13:55:31 UTC 2016


> On Jun 9, 2016, at 5:37 AM, Gergely Imreh <imrehg at gmail.com> wrote:
> 
> Hi,
> 
> I've been testing FreeBSD 11.0-CURRENT on a RaspberryPi2. I'm relatively
> new to FreeBSD, and wondering if there's any advice for improving the
> performance a bit.
> 
> First, it looks like there's a lot of i/o blocking behaviour going on. For
> example running MediaWiki on the board, if I compile any ports, the site
> itself is pretty much unusable (the PHP scripts time out even with 180s
> timeouts). The strangest thing is that the CPU usage is not at 100% all the
> way, can be that all 4 cores are ~99% idle, and still everything goes very
> slow. Once the ports compilation or any other i/o-related task is finished,
> it's snappy again.
> 
> Any idea why it could be to have such big latency/lag even though the CPU
> is idle? Is there anything I could test?

What’s the HZ for the system? The sd/mmc system has a lot of context switches
may be one reason for this.

> Second, I've also tried profiling the SD card a bit. The very same card
> (SanDisk 32GB), same RPi board once with a fresh install of FreeBSD and
> once with a fresh install of ArchLinuxARM, running bonnie++ -s 2000 (the
> results below)

Is the hardware running the same? Eg, clock rate and number of bits?

> The block write perfomance on ArchLinux is ~55% higher (14M/s vs 9M/s),

ArchLinux has gotten better. When I last profiled Linux on RPi2, I was only
able to get ~10MB/s. At the time, FreeBSD was getting 9.2 or 9.3MB/s for read.
I chalked up the difference to the context switches. I didn’t measure the
latency though.

> while rewrite and per char output is 4-5x larger. Block read is also ~70%
> larger (25M/s vs 15M/s). This is without any tuning. Any idea why the
> FreeBSD performance on the exact same hardware is so different, and whether
> it can be improved? I guess these two questions are related.

Linux pre-erases blocks that will be written with multi-write, which may help
(thought my experiments with it were a net negative). That may be part
of the issue as well. We may have some MAXPHYS issues that could affect
things. In general, the multi read / multi write stuff could likely use a lot of
TLC. We optimized it for Atmel controller that isn’t a great match for modern
SD host controllers.

As for making it better with just tuning, maybe 256k or 1M MAXPHYS will
help. But we’ll likely need to spend some quality time with dtrace and the MMC/SD
stack to find where the ‘stalls’ are. We’re likely not streaming things as efficiently
Linux does...

Warner



More information about the freebsd-arm mailing list