nand performance

Ian Lepore freebsd at
Fri Dec 21 00:50:55 UTC 2012

On Thu, 2012-12-20 at 12:07 -0800, John-Mark Gurney wrote:
> Ian Lepore wrote this message on Wed, Dec 19, 2012 at 17:41 -0700:
> > I've been working to get nandfs going on a low-end Atmel arm system.
> > Performance is horrible.  Last weekend I got my nand-based DreamPlug
> > unbricked and got nandfs working on it too.  Performance is horrible.
> > 
> > By that I'm referring not to the slow nature of the nand chips
> > themselves, but to the fact that accessing them locks out userland
> > processes, sometimes for many seconds at a time.  The problem is real
> > easy to see, just format and populate a nandfs filesystem, then do
> > something like this
> > 
> >   mount -r -t nandfs /dev/gnand0s.root /mnt
> >   nice +20 find /mnt -type f | xargs -J% cat % > /dev/null
> > 
> > and then try to type in another terminal -- sometimes what you're typing
> > doesn't get echoed for 10+ seconds a time.
> > 
> > The problem is that the "I/O" on a nand chip is really just the cpu
> > copying from one memory interface to another, a byte at a time, and it
> > must also use busy-wait loops to wait for chip-ready and status info.
> > This is being done by high-priority kernel threads, so everything else
> > is locked out.
> > 
> > It seems to me that this is about the same situation as classic ATA PIO
> > mode, but PIO doesn't make a system that unresponsive.  
> > 
> > I'm curious what techniques are used to migitate performance problems
> > for ATA PIO modes, and whether we can do something similar for nand.  I
> > poked around a bit in dev/ata but the PIO code I saw (which surely
> > wasn't the whole picture) just used a bus_space_read_multi().  Can
> > someone clue me in as to how ATA manages to do PIO without usurping the
> > whole system?
> Looks like the problem is all the DELAY calls in dev/nand/nand_generic.c..
> DELAY is a busy wait not letting the cpu do anything else...  The bad one
> is probably generic_erase_block as it looks like the default is 3ms,
> plenty of time to let other code run...  If it could be interrupt driven,
> that'd be best...
> I can't find the interface that would allow sub-hz sleeping, but there is
> tsleep that could be used for some of the larger sleeps...  But switching
> to interrupts + wakeup would be best...

Yeah, the DELAY() calls were actually not working for me (I think I'm
the first to test this stuff with an ONFI type chip), and I've replaced
them all with loops that poll for ready status, which at least minimizes
the wait time, but it's still a busy-loop.  Real-world times for the
chips I'm working with are 30uS to open a page for read, ~270uS to write
a page, and ~750uS to erase a block.

But whether busy-looping for status or busy-looping polling a clock for
DELAY, or transferring a byte at a time for the actual IO, it's all the
same... it's cpu and memory bus cycles that are happening in a
high-priority kernel thread.  

The interface between the low-level controller and the nand layer
doesn't allow for interrupt handling right now.  Not all hardware
designs would allow for using interrupts, but mine does, so reworking
things to allow its use would help some.  Well, it would help for writes
and erases.  The 180mhz ARM I'm working with doesn't get much done in
30uS, so reads wouldn't get any better.   Reads are all I really care
about, since the product in the field will have a read-only filesystem,
and firmware updates are infrequent and it's okay if they're a bit slow.

-- Ian

More information about the freebsd-arm mailing list