Constant rebooting after power loss

Sat Apr 2 08:23:18 UTC 2011

2011/4/2 Matthew Dillon <dillon at apollo.backplane.com>:
>    The core of the issue here comes down to two things:
>
>    First, a power loss to the drive will cause the drive's dirty write cache
>    to be lost, that data will not make it to disk.  Nor do you really want
>    to turn of write caching on the physical drive.  Well, you CAN turn it
>    off, but if you do performance will become so bad that there's no point.
>    So turning off the write caching is really a non-starter.
>
>    The solution to this first item is for the OS/filesystem to issue a
>    disk flush command to the drive at appropriate times.  If I recall the
>    ZFS implementation in FreeBSD *DOES* do this for transaction groups,
>    which guarantees that a prior transaction group is fully synced before
>    a new ones starts running (HAMMER in DragonFly also does this).
>    (Just getting an 'ack' from the write transaction over the SATA bus only
>    means the data made it to the drive's cache, not that it made it to
>    the platter).

Amen !

>    I'm not sure about UFS vis-a-vie the recent UFS logging features...
>    it might be an option but I don't know if it is a default.  Perhaps
>    someone can comment on that.
>
>    One last note here.  Many modern drives have very large ram caches.
>    OCZ's SSDs have something like 256MB write caches and many modern HDs
>    now come with 32MB and 64MB caches.  Aged drives with lots of relocated
>    sectors and bit errors can also take a very long time to perform writes
>    on certain sectors.  So these large caches take time to drain and one
>    can't really assume that an acknowledged write to disk will actually
>    make it to the disk under adverse circumstances any more.  All sorts
>    of bad things can happen.
>
>    Finally, the drives don't order their writes to the platter (you can
>    set a bit to tell them to, but like many similar bits in the past there
>    is no real guarantee that the drives will honor it).  So if two
>    transactions do not have a disk flush command inbetween them it is
>    possible for data from the second transaction to commit to the platter
>    before all the data from the first transaction commits to the platter.
>    Or worse, for the non-transactional data to update out of order relative
>    to the transactional data which was supposed to commit first.
>
>    Hence IMHO the OS/filesystem must use the disk flush command in such
>    situations for good reliability.
>
>    --
>
>    The second problem is that a physical loss of power to the drive can
>    cause the drive to physically lose one or more sectors, and can even
>    effectively destroy the drive (even with the fancy auto-park)... if the
>    drive happens to be in the middle of a track write-back when power is
>    lost it is possible to lose far more than a single sector, including
>    sectors unrelated to recent filesystem operations.
>
>    The only solution to #2 is to make sure your machines (or at least the
>    drives if they happen to be in external enclosures) are connected to
>    a UPS and that the machines are communicating with the UPS via
>    something like the "apcupsd" port.  AND also that you test to make
>    sure the machines properly shut themselves down when AC is lost before
>    the UPS itself runs out of battery time.  After all, a UPS won't help
>    if the machines don't at least idle their drives before power is lost!!!
>
>    I learned this lesson the hard way about 3 years ago.  I had something
>    like a dozen drives in two raid arrays doing heavy write activity and
>    lost physical power and several of the drives were totally destroyed,
>    with thousands of sector errors.  Not just one or two... thousands.
>
>    (It is unclear how SSDs react to physical loss of power during heavy
>    writing activity.  Theoretically while they will certainly lose their
>    write cache they shouldn't wind up with any read errors).
>
>                                                -Matt
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>


-- 
Olivier Smedts                                                 _
                                        ASCII ribbon campaign ( )
e-mail: olivier at gid0.org        - against HTML email & vCards  X
www: http://www.gid0.org    - against proprietary attachments / \

  "Il y a seulement 10 sortes de gens dans le monde :
  ceux qui comprennent le binaire,
  et ceux qui ne le comprennent pas."