Flash disks and FFS layout heuristics

Matthew Dillon dillon at apollo.backplane.com
Sun Mar 30 17:10:34 PDT 2008


:I belive phk means that ggogling for "Flash Adaptation Layer" turns up some 
:results.
:
:> And no, I really don't want to discuss it any further with you.
:> 
:But please continue the duscussion for the sake of the silent majority, there 
:are loads of us out here who are interested in flash fs development.
:
:Also, i had the impression that newer flash based hardrives had internal logig 
:to spread out writs evenly over the disk and to remap worn out blocks. And that 
:the result of these algoritms increased MTBF to atleast the MTBF for spinning 
:disks. Or have i misread something?
:
:
: 	/Chris

    I found some of it, though I dunno if it's what he was specifically
    referencing.  The slide show was interesting though there were a
    number of factual errors, but I didn't really see anything in-depth
    about 'Flash Adaptation Layer'.  It seems to be a fairly generically
    coined term for something that is far from generic in actual
    implementation.

    The idea of remapping flash sectors could be considered a poor-man's
    way of dealing with wear issues in that remapping tends to be fairly
    limited... for example, you might use a fixed-sized table and once the
    table fills up the device is toast.  Remapping doesn't actually prevent
    the uneven wear from occuring, it just gives you a fixed factor of
    additional runway.  If remapping gets complex enough to work with an
    arbitrary number of dead sectors then it is effectively a 'Flash
    Adaptation Layer'.  Limited remapping (e.g. using a fixed-sized table)
    is really easy to code up.

    But there are some huge differences between the two.  Really huge
    differences.  Detecting a worn cell requires generating a CRC and
    correcting it requires generating an ECC code.  Neither CRCs nor
    ECCs are perfect and actually depending on them to handle situations
    that happen *normally* during the device's life-span is bad business.

    A proper sector translation mechanism guarantees even wear of all
    the cells.  You don't *GET* CRC errors under normal operation of
    the device.  You still want to have a CRC to detect the situation, and
    perhaps even a small ECC to try to correct it, but these exist to
    handle manufacturing defects (which can limit the life of individual
    cells) rather then to handle wear issues unrelated to manufacturing
    defects, which is what a limited remapping mechanic does.  A wear issue
    can cause many cells to die (see later on w/ regards to data retention)
    whereas a manufacturing defect tends to result in single bit errors.

    Insofar as indestructability, in the short term flash storage is 
    more resilient then disk storage especially considering that there are
    no moving parts, but flash cells will degrade over time whether you
    write to them or not, depending on temperature.

    Look at any flash part, bring up the technical specifications and 
    there will be an entry for 'data retention' time.  Usually it's around
    10 years at 20 C.  If it is hotter the data is retained for a shorter
    period of time, if it is colder the data is retained for a longer
    period of time.  Retention is different from cell wear.  What retention
    means is that if you have a flash device, you need to rewrite the
    cells (you can't just read the cell like a dram refresh, but you don't
    have to go through an erase cycle.  You only have to rewrite the cell)...
    you need to do that at least once every 5 years to be safe, or you risk
    losing the data.  Rewriting the cell does add wear to it so you don't
    want to rewrite it too often.  I have personally seen flash devices
    lose data... I'm trying to remember how many years it was but I think
    it was on the order of 15 years in one unit out of 30 that was subject
    to fairly hot temperatures in the summer.

    A flash unit must therefore run a scrubber to really be reliable.  It is
    absolutely required if you use a remapping algorithm, and a bit less so
    if you use a proper storage layer which generates even wear.  The real
    difference between the two comes down to shelf life (when you aren't
    scrubbing anything), since worn cells will die a lot more quickly then
    unworn cells.

    A scrubber in this case must validate the CRC and there is usually a
    way to tell the device to operate at a different detection threshold in
    order to detect a failing cell *before* it actually fails (write-verify
    usually does this when writing but you also want to do this when
    scrubbing, if you want to do it right).  The idea is for the scrubber
    to detect bit errors *before* the data becomes unrecoverable and,
    in fact, before the data even needs to be ECC'd.  You should not have
    to actually use ECC correction under normal operation of the device over
    its entire life span.

    If you have a wear situation where multiple cells are failing and you
    do not scan the data in the flash often enough (using write-verify
    thresholds, NOT normal operations thresholds) to detect the failing
    cells, and/or you do not have a verification voltage capability to
    detect failing cells before they fail (for example you take a worn
    device offline and store it on a shelf somewhere), then you risk
    detecting the failed cells too late at a point where there are too many
    failed cells to correct.  This is of particular concern for very large
    flash storage.

    One side-effect of having a proper storage layer is that the scrubber is
    typically built in to it.  Just the mechanic of write-appending and
    having to repack the storage usually cycles the storage in a time frame
    less then 10 years.  You can scrub either way, though, it isn't hard to
    do and doesn't require remapping the cell unless it has failed, just
    re-writing the same data resets the energy levels.

    A flash is still more reliable then a hard drive in the short-term.
    However, disk media tends to retain magnetic orientation longer then
    a flash cell (longer then 10 years)... well, I'm not sure about the
    absolute latest technology but that was certainly the case 10 years
    ago.  Disk media has similar thermal erasure issues so, really, both
    types of media have a limited data retention span.  Recovering data
    from an aging flash chip is a lot harder, though, because you have to
    remove the flash packaging and even shave the chip (yes, it can be done,
    there have been numerous cases where supposedly secure execute-only
    flash and E^2prom could be read out by shaving the chip, though I dunno
    if it has been done with recent super-high-density flashes).  With
    disk media you can generally recover thermally erased bits using very
    expensive equipment with very sensitive detectors.  If the data is
    important, and you are willing to pay for it, you can recover it off
    a HD.

    Typically the only difference between 'consumer' and 'industrial' flash
    is how they sort the chips coming out of the plant.  It is possible to 
    detect weak cells and sort the chips accordingly (thus consumer chips
    have fewer rewrite cycles), though frankly in most cases a consumer
    chip will be almost as good as an industrial one.  If you run a proper
    sector translation layer which generates even wear and you have the
    ability to use the write-verify mechanism in your scrubbing code, it
    doesn't really matter which grade you use.

						-Matt



More information about the freebsd-arch mailing list