Flash disks and FFS layout heuristics
Matthew Dillon
dillon at apollo.backplane.com
Sun Mar 30 18:36:04 PDT 2008
I just finished reading up on the latest NAND stuff, so I am going
to add an addendum.
There was one factual error in my last posting having to do with
byte rewrites. I'm not sure this applies to all manufacturers but
one spec sheet I looked at specifically limited non-erase rewriting
to two consecutive page-write sequences. After that you have to perform
an erase before you can write (and rewrite once) again.
**** I'd be interested in knowing if any chip vendors support multiple
**** consecutive page-write sequences without erase cycles inbetween
**** (i.e. allowing 1->0 transitions like you can do with NOR).
It looks like most vendors provide SECTOR_SIZE + 64 bytes of auxillary
information. The auxillary information is where you typically store
the CRC and ECC (they can be the same thing but it's a good idea to
implement them separately). I was surprised that the vendors
speced only a 2 bit detect / 1 bit correction code, which is actually
the simplest hamming code you can have.
Describing this type of hamming code in a paragraph is actually pretty
easy. You can think of it as a code which identifies which bit in a
block is in error and needs to be 'flipped' (aka the '1' bit correction).
For example, if you are ECC'ing 8192 bytes you have 65536 bits
which means the hamming code needs to be able to encode a 16 bit
correction address, hence it requires 16 bits of storage for the
correction, plus another (typically) log2(16) = 4 bits of storage for
the detection, plus 1 more bit (you have to include the storage
taken up by the ECC code itself). So ECC on 65536 bits requires 21 bits.
I'm doing that from memory so don't quote me, we used those sorts of
ECC in radio modem protocols 20 years ago.
The actual construction of the correction address is a bit more
complex but that is the basics of how a 2 bit detect / 1 bit correct
hamming code works.
The vendor bit error handling recommendation is to relocate the page
and then erase the original rather then to rewrite the page, so the
scrubbing code can't just rewrite the same page when it finds an error.
You still have to scrub, though, or you risk accumulating too many
errors to correct. write-verify is typically automatic in the chips
but the two I checked do not seem to have a variable threshold for
read operations for early detection of leaking bits. Older chips had
separate power supplies for the programming power but newer ones
incorporate internal charge pumps so it may not be doable, which
would be too bad.
Life span and shelf life information is correct. My assumption there
is that the manufacturers are specing the shelf life for leakage in the
worst case write verses verify cycle (the verify is internal to the chip,
the external entity just does a write and reads the verification status
after it finishes). If there is no way to do a read at a lower
sensitivity level there is really no way to locate failing bits before
they actually fail. That doesn't seem right so I may be missing
something in the spec.
With regards to averaging out the wear by not erase cycling the same
page over and over again, my read from the chip specs is that you
basically have no choice on the matter... you MUST average the wear out,
period end of story. This also precludes using a simple sector
remapping algorithm, particularly if the re-writes between erase
cycles for a page are limited.
The reason you MUST average the wear out is that the vendors do not
appear to be guaranteeing even 100K erase cycles.
I've read flash chip specs a billion times... when you read between
the lines what the vendor is saying, basically, is that the shelf life
of a stored bit is only guaranteed to be 10 years if you don't rewrite
the cell more then X number of times. So while it may be possible to
write more then X number of times, you risk serious data degredation
('shelf life') if you do, even if the write does not fail. This
is the only guarantee they make, and it is based on the damage the cell
takes when you erase/write to it which increases leakage which reduces
shelf life.
They do NOT guarantee that you can actually do X erase cycles, they
simply say that the chip will tell you if an erase cycle fails, and that
it can fail ANY TIME... the very first erase cycle you do on a
particular page can fail.
The ONLY thing the vendors guarantee is that the FIRST page on the device
can go through a certain number of erase cycles, like 1000 or 10,000.
No other page on the device has any sort of guarantee.
This is very important. This means you MUST average the wear out,
period, whether it is consumer OR industrial grade.
-Matt
More information about the freebsd-arch
mailing list