arm/155214: [patch] MMC/SD IO slow on Atmel ARM with modern
large SD cards
Ian Lepore
freebsd at damnhippie.dyndns.org
Fri Mar 4 20:20:12 UTC 2011
The following reply was made to PR arm/155214; it has been noted by GNATS.
From: Ian Lepore <freebsd at damnhippie.dyndns.org>
To: ticso at cicely.de
Cc: FreeBSD-gnats-submit at freebsd.org
Subject: Re: arm/155214: [patch] MMC/SD IO slow on Atmel ARM with modern
large SD cards
Date: Fri, 04 Mar 2011 13:10:12 -0700
On Thu, 2011-03-03 at 00:52 +0100, Bernd Walter wrote:
> On Wed, Mar 02, 2011 at 02:53:18PM -0700, Ian Lepore wrote:
> >
> > >Number: 155214
> > >Category: arm
> > >Synopsis: [patch] MMC/SD IO slow on Atmel ARM with modern large SD cards
> > >Confidential: no
> > >Severity: serious
> > >Priority: medium
> > >Responsible: freebsd-arm
> > >State: open
> > >Quarter:
> > >Keywords:
> > >Date-Required:
> > >Class: sw-bug
> > >Submitter-Id: current-users
> > >Arrival-Date: Wed Mar 02 22:10:10 UTC 2011
> > >Closed-Date:
> > >Last-Modified:
> > >Originator: Ian Lepore <freebsd at damnhippie.dyndns.org>
> > >Release: FreeBSD 8.2-RC3 arm
> > >Organization:
> > none
> > >Environment:
> > FreeBSD dvb 8.2-RC3 FreeBSD 8.2-RC3 #49: Tue Feb 15 22:52:14 UTC 2011 root at revolution.hippie.lan:/usr/obj/arm/usr/src/sys/DVB arm
> >
> > Included patch is against -current even though the problem was first seen on
> > 8.2-RC3
> >
> > The problem was seen on AT91RM9200 hardware, but presumably also affects the
> > SAM9 series which uses the same driver code.
> >
> > >Description:
> > With the latest generation of large-capacity SD cards, write speeds as low as
> > 20 kbytes/sec are seen. These modern cards have erase-block sizes as large as
> > 8192K (compared to 32K typical on previous generations). The at91_mci driver
> > does only single-sector IO; apparently this requires the SD card to internally
> > perform an expensive read-erase-modify-write cycle for each 512 byte block
> > written to the card.
>
> The complete details of this problem are completely known.
> However the RM9200 has many hardware problems to be worked around and
> so far noone actually did.
> Your patch is quite large, so I would like to ask you explicitly:
> Did you test your patch with an AT91RM9200 system?
> You did enable multisector support for reading and (more important) for
> writing?
> But you didn't activate 4bit mode?
> With 4bit mode there is no hardware bug, but when the driver was written
> is was just done in a lazy way because activating 4bit on SD cards require
> special handling - in the meantime the SD layer itself was extracted and
> has 4bit support, but the at91_mci driver was never updated to use that.
>
> PS: I'm very pleased to see your work since SD write speed was a
> major show stopper for some applications
>
I made some time today to try 4-bit mode in the mci driver, using
8.2-RELEASE as a testbed. I quickly determined that just enabling
4-bit mode results in corrupted read data severe enough to virtually
always cause "root mount error" at boot. Occasionally it'll manage to
mount root but then lock up or panic during rc-file processing. It
does this both with the original driver and with my patched driver
configured for single-block or multi-block operation.
After some experimenting to find the cause of the corrupted data, I
realized we're violating the SD spec by running the bus at 30mhz --
the spec says 25mhz max until you use CMD6 to switch to high-speed
mode if the card supports it. Our next lower available speed is
15mhz, and when I set that as the max speed, 4-bit works perfectly,
both in the original driver and with my patches in single or
multi-block operation. (In my patched driver I had to add a
controller reset following a multi-block read stop, similar to after a
multi-write, to avoid occasional spurious data crc errors in 4-bit
mode. The data we want is read correctly; the crc error happens on
the block that's still coming in as the stop command is being issued.
I'm not sure why this only happens in 4-bit mode.)
Since we've been getting away with 30mhz/1-bit for years, I surmise
that any card that is capable of delivering 25mhz/4-bit is also
capable of doing 30mhz/1-bit even though that's a slight violation of
the spec. But 30mhz/4-bit appears to be enough of a violation that
even modern cards don't keep up. (When looking at dumps of the
corrupted read data, an old card had a lot of corruption, like 20% of
the data was read wrong. A modern card had just a few bits wrong out
of every few kbytes read.)
Since 15mhz/4bit is still twice the data throughput of 30mhz/1bit I
decided to do some crude benchmarking to see if it's worth the trouble
of making 4-bit work correctly. The results appear below. In
summary, there is definitely a benefit to using 4-bit transfers, but
the improvement isn't nearly as dramatic as the change from single- to
multi-block IO.
Supporting 4-bit transfers properly will require some changes in
dev/mmc. It doesn't currently use CMD6 to switch to high-speed mode
at all. I'm assuming if we update it to do so, we'll have no problem
running at 30mhz/4-bit. There'll also need to be some fixes in the
routine that calculates the speed to run at, because right now it
doesn't account for the 25mhz speed limit set by the spec before
switching to high-speed (which is why we end up running at 30mhz).
The mci driver will also need some updates to round down to the next
lower supported clock speed requested by the upper layers, but it
would probably be good to have a bit of a hack in there as well to
allow 30mhz operation in 1-bit mode since folks have come to expect
that and it seems to work ok.
About the benchmarks...
I tested with two different cards, noted below by their erase block
sizes. The card with the 32-block erase size is a SanDisk 512mb card
from several years ago. The card with the 8192-block erase size is a
SanDisk 2gb card purchased recently. The older card does not claim to
support high-speed mode, the newer card does (but of course we don't
switch the card to hs mode).
I tested each card with each combo of bus speed, bus width, and
single- versus multi-block IO. All of the results below are with my
patched driver. I also briefly tested the original unpatched 8.2
driver and found the results very much in line with the 1-block
results from my patched driver. (The patched driver performs a little
better even in single-block mode, probably because it gets the same
work done with fewer interrupts.)
Read and write speeds are as reported by these commands:
dd if=/dev/mmcsd0s2a of=/dev/null bs=1m count=10
dd if=/dev/zero of=/dev/mmcsd0s2a bs=1m count=10
Each test was run several times immediately after rebooting; median
values reported. There were no writable filesystems mounted and
relatively little going on in the system in general, but I didn't get
fanatical about leveling the test conditions.
Erase/clock/bus/xfer size Read bytes/sec Write bytes/sec
32/30MHz/1bit/1-block 864452 333324
32/15MHz/4bit/1-block 975780 346738
8192/30MHz/1bit/1-block 647241 24211
8192/15MHz/4bit/1-block 722659 24253
32/30MHz/1bit/64-block 2192806 1775660
32/15MHz/4bit/64-block 3075302 1775302
8192/30MHz/1bit/64-block 2133880 1503959
8192/15MHz/4bit/64-block 2947133 1753540
Another crude little benchmark... right after booting I logged on as
root immediately and did a vmstat -i, so this should roughly represent
how many interrupts it took to get booted and launch root's shell (all
read IO, there are no writeable filesystems mounted, both done at
30mhz/1-bit):
vmstat -i interrupt total rate
original driver (1-block) irq10: at91_mci0 42384 1284
patched driver (64-block) irq10: at91_mci0 1365 52
Based on the benchmark results, and the fact that I don't really have
the time to take on the dev/mmc changes right now, I think we should
adopt the multi-block patches and stick with 30mhz/1-bit for now.
Maybe I can find some time later this year to get dev/mmc working
better with high-speed mode (without accidentally breaking the sdhci
world, which I don't know enough about right now).
More information about the freebsd-arm
mailing list