ZFS: How to enable cache and logs.

Thu May 12 09:50:30 UTC 2011

Quoting Jeremy Chadwick <freebsd at jdc.parodius.com> (from Thu, 12 May  
2011 02:05:24 -0700):

>> > What guarantee is there that the intent log -- which is written to the
>> > disk -- actually got written to the disk in the middle of a power
>> > failure?  There's a lot of focus there on the idea that "the intent log
>> > will fix everything, but may lose writes", but what guarantee do I have
>> > that the intent log isn't corrupt or botched during a power failure?
>>
>> I expect that checksumming also works for ZIL (anybody knows?). If

It would be a damn big design flaw if it wouldn't checksum the ZIL.

>> that is the case, corruption would be detected, but you will have lost
>> data unless you are using mirrored slog devices.
>
> I can't believe that statement either (the last line).
>
> I guess that's also what I'm asking here -- what guarantee do you have
> that even with a mirrored 2-disk SLOG (or heck, 3 or 4!) that *no data*
> will be *lost* during a power outage?
>
> It seems to me the proper phrase would be "the likelihood of losing an
> entire pool during a power outage is lessened".  Alexander indirectly
> hinted at this in another post of his tonight, specifically regarding
> zpool v15 versus v28:
>
> "The difference between v15 and v28 is the amount of data you lose (the
> entire pool vs. only what is still on the log devices)".

To recover the context: This was for losing the SLOG completely.

> This makes much more sense to me.
>
> It seems that in a power outage, there will always be some form of data
> loss.  I imagine even systems that have hardware RAM/cache with BBUs on
> everything; there's always some form of caching going on *somewhere*
> within a system, from CPU all the way up, that guarantees some degree of
> data loss).  I guess I'm OCD'ing over the terminology here.  Sorry.

A simple power-loss should not destroy the SLOG (or the pool). For  
easy comprehension just let us assume that the log can only be  
destroyed by a hardware problem (broken disk -> the reason why it  
should be mirrored -> if all devices are broken, you have the same  
case as if the pool without a SLOG lost more drives than the  
redundancy allows): As written in my other mail (which I've send  
before I've seen this mail but probably arrived after you wrote this  
mail), the SLOG is not about an enhanced guarantee (you had the  
guarantee before), it is about performance.

You need to handle the data-loss problem at several layers.

If you have a power-loss during the write of the SLOG, you will lose  
the last SLOG entry (but there is no corruption). At this point in  
time the write did not return to the application, so the application  
should not have ACKed the reception of the data. If it did, you will  
lose data. If it didn't the application will just pick this  
transaction again from the queue of outstanding transactions and redo  
it. Detecting the case of a succeeded write but a power-loss before  
the ACK to the sender is up to be handled in the application too (e.g.  
calculating an ID based upon the incoming data, writing the ID  
together with the rest of the transaction, if the ID is in e.g. the DB  
and a corresponding state flag in the DB (if the processing is split  
up into several DB-transactions) which is written in the corresponding  
transaction then you know that the write before the power-loss was  
done correctly and the app just needs to ACK to the sender).

Was this clear enough, or shall I try to draw a better picture (in  
this case please try to specify your concerns, maybe with an example)?

Bye,
Alexander.

-- 
Do YOU have redeeming social value?

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137