Constant rebooting after power loss
Olivier Smedts
olivier at gid0.org
Sat Apr 2 08:23:18 UTC 2011
2011/4/2 Matthew Dillon <dillon at apollo.backplane.com>:
> The core of the issue here comes down to two things:
>
> First, a power loss to the drive will cause the drive's dirty write cache
> to be lost, that data will not make it to disk. Nor do you really want
> to turn of write caching on the physical drive. Well, you CAN turn it
> off, but if you do performance will become so bad that there's no point.
> So turning off the write caching is really a non-starter.
>
> The solution to this first item is for the OS/filesystem to issue a
> disk flush command to the drive at appropriate times. If I recall the
> ZFS implementation in FreeBSD *DOES* do this for transaction groups,
> which guarantees that a prior transaction group is fully synced before
> a new ones starts running (HAMMER in DragonFly also does this).
> (Just getting an 'ack' from the write transaction over the SATA bus only
> means the data made it to the drive's cache, not that it made it to
> the platter).
Amen !
> I'm not sure about UFS vis-a-vie the recent UFS logging features...
> it might be an option but I don't know if it is a default. Perhaps
> someone can comment on that.
>
> One last note here. Many modern drives have very large ram caches.
> OCZ's SSDs have something like 256MB write caches and many modern HDs
> now come with 32MB and 64MB caches. Aged drives with lots of relocated
> sectors and bit errors can also take a very long time to perform writes
> on certain sectors. So these large caches take time to drain and one
> can't really assume that an acknowledged write to disk will actually
> make it to the disk under adverse circumstances any more. All sorts
> of bad things can happen.
>
> Finally, the drives don't order their writes to the platter (you can
> set a bit to tell them to, but like many similar bits in the past there
> is no real guarantee that the drives will honor it). So if two
> transactions do not have a disk flush command inbetween them it is
> possible for data from the second transaction to commit to the platter
> before all the data from the first transaction commits to the platter.
> Or worse, for the non-transactional data to update out of order relative
> to the transactional data which was supposed to commit first.
>
> Hence IMHO the OS/filesystem must use the disk flush command in such
> situations for good reliability.
>
> --
>
> The second problem is that a physical loss of power to the drive can
> cause the drive to physically lose one or more sectors, and can even
> effectively destroy the drive (even with the fancy auto-park)... if the
> drive happens to be in the middle of a track write-back when power is
> lost it is possible to lose far more than a single sector, including
> sectors unrelated to recent filesystem operations.
>
> The only solution to #2 is to make sure your machines (or at least the
> drives if they happen to be in external enclosures) are connected to
> a UPS and that the machines are communicating with the UPS via
> something like the "apcupsd" port. AND also that you test to make
> sure the machines properly shut themselves down when AC is lost before
> the UPS itself runs out of battery time. After all, a UPS won't help
> if the machines don't at least idle their drives before power is lost!!!
>
> I learned this lesson the hard way about 3 years ago. I had something
> like a dozen drives in two raid arrays doing heavy write activity and
> lost physical power and several of the drives were totally destroyed,
> with thousands of sector errors. Not just one or two... thousands.
>
> (It is unclear how SSDs react to physical loss of power during heavy
> writing activity. Theoretically while they will certainly lose their
> write cache they shouldn't wind up with any read errors).
>
> -Matt
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>
--
Olivier Smedts _
ASCII ribbon campaign ( )
e-mail: olivier at gid0.org - against HTML email & vCards X
www: http://www.gid0.org - against proprietary attachments / \
"Il y a seulement 10 sortes de gens dans le monde :
ceux qui comprennent le binaire,
et ceux qui ne le comprennent pas."
More information about the freebsd-stable
mailing list