Flash disks and FFS layout heuristics
Matthew Dillon
dillon at apollo.backplane.com
Sun Mar 30 17:10:34 PDT 2008
:I belive phk means that ggogling for "Flash Adaptation Layer" turns up some
:results.
:
:> And no, I really don't want to discuss it any further with you.
:>
:But please continue the duscussion for the sake of the silent majority, there
:are loads of us out here who are interested in flash fs development.
:
:Also, i had the impression that newer flash based hardrives had internal logig
:to spread out writs evenly over the disk and to remap worn out blocks. And that
:the result of these algoritms increased MTBF to atleast the MTBF for spinning
:disks. Or have i misread something?
:
:
: /Chris
I found some of it, though I dunno if it's what he was specifically
referencing. The slide show was interesting though there were a
number of factual errors, but I didn't really see anything in-depth
about 'Flash Adaptation Layer'. It seems to be a fairly generically
coined term for something that is far from generic in actual
implementation.
The idea of remapping flash sectors could be considered a poor-man's
way of dealing with wear issues in that remapping tends to be fairly
limited... for example, you might use a fixed-sized table and once the
table fills up the device is toast. Remapping doesn't actually prevent
the uneven wear from occuring, it just gives you a fixed factor of
additional runway. If remapping gets complex enough to work with an
arbitrary number of dead sectors then it is effectively a 'Flash
Adaptation Layer'. Limited remapping (e.g. using a fixed-sized table)
is really easy to code up.
But there are some huge differences between the two. Really huge
differences. Detecting a worn cell requires generating a CRC and
correcting it requires generating an ECC code. Neither CRCs nor
ECCs are perfect and actually depending on them to handle situations
that happen *normally* during the device's life-span is bad business.
A proper sector translation mechanism guarantees even wear of all
the cells. You don't *GET* CRC errors under normal operation of
the device. You still want to have a CRC to detect the situation, and
perhaps even a small ECC to try to correct it, but these exist to
handle manufacturing defects (which can limit the life of individual
cells) rather then to handle wear issues unrelated to manufacturing
defects, which is what a limited remapping mechanic does. A wear issue
can cause many cells to die (see later on w/ regards to data retention)
whereas a manufacturing defect tends to result in single bit errors.
Insofar as indestructability, in the short term flash storage is
more resilient then disk storage especially considering that there are
no moving parts, but flash cells will degrade over time whether you
write to them or not, depending on temperature.
Look at any flash part, bring up the technical specifications and
there will be an entry for 'data retention' time. Usually it's around
10 years at 20 C. If it is hotter the data is retained for a shorter
period of time, if it is colder the data is retained for a longer
period of time. Retention is different from cell wear. What retention
means is that if you have a flash device, you need to rewrite the
cells (you can't just read the cell like a dram refresh, but you don't
have to go through an erase cycle. You only have to rewrite the cell)...
you need to do that at least once every 5 years to be safe, or you risk
losing the data. Rewriting the cell does add wear to it so you don't
want to rewrite it too often. I have personally seen flash devices
lose data... I'm trying to remember how many years it was but I think
it was on the order of 15 years in one unit out of 30 that was subject
to fairly hot temperatures in the summer.
A flash unit must therefore run a scrubber to really be reliable. It is
absolutely required if you use a remapping algorithm, and a bit less so
if you use a proper storage layer which generates even wear. The real
difference between the two comes down to shelf life (when you aren't
scrubbing anything), since worn cells will die a lot more quickly then
unworn cells.
A scrubber in this case must validate the CRC and there is usually a
way to tell the device to operate at a different detection threshold in
order to detect a failing cell *before* it actually fails (write-verify
usually does this when writing but you also want to do this when
scrubbing, if you want to do it right). The idea is for the scrubber
to detect bit errors *before* the data becomes unrecoverable and,
in fact, before the data even needs to be ECC'd. You should not have
to actually use ECC correction under normal operation of the device over
its entire life span.
If you have a wear situation where multiple cells are failing and you
do not scan the data in the flash often enough (using write-verify
thresholds, NOT normal operations thresholds) to detect the failing
cells, and/or you do not have a verification voltage capability to
detect failing cells before they fail (for example you take a worn
device offline and store it on a shelf somewhere), then you risk
detecting the failed cells too late at a point where there are too many
failed cells to correct. This is of particular concern for very large
flash storage.
One side-effect of having a proper storage layer is that the scrubber is
typically built in to it. Just the mechanic of write-appending and
having to repack the storage usually cycles the storage in a time frame
less then 10 years. You can scrub either way, though, it isn't hard to
do and doesn't require remapping the cell unless it has failed, just
re-writing the same data resets the energy levels.
A flash is still more reliable then a hard drive in the short-term.
However, disk media tends to retain magnetic orientation longer then
a flash cell (longer then 10 years)... well, I'm not sure about the
absolute latest technology but that was certainly the case 10 years
ago. Disk media has similar thermal erasure issues so, really, both
types of media have a limited data retention span. Recovering data
from an aging flash chip is a lot harder, though, because you have to
remove the flash packaging and even shave the chip (yes, it can be done,
there have been numerous cases where supposedly secure execute-only
flash and E^2prom could be read out by shaving the chip, though I dunno
if it has been done with recent super-high-density flashes). With
disk media you can generally recover thermally erased bits using very
expensive equipment with very sensitive detectors. If the data is
important, and you are willing to pay for it, you can recover it off
a HD.
Typically the only difference between 'consumer' and 'industrial' flash
is how they sort the chips coming out of the plant. It is possible to
detect weak cells and sort the chips accordingly (thus consumer chips
have fewer rewrite cycles), though frankly in most cases a consumer
chip will be almost as good as an industrial one. If you run a proper
sector translation layer which generates even wear and you have the
ability to use the write-verify mechanism in your scrubbing code, it
doesn't really matter which grade you use.
-Matt
More information about the freebsd-arch
mailing list