Constant rebooting after power loss
Matthew Dillon
dillon at apollo.backplane.com
Sat Apr 2 18:57:21 UTC 2011
:It should also be noted that some drives ignore or lie about these flush commands: i.e., they say they flushed the buffers but did not in fact do so. This is sometimes done on cheap SATA drives, but also on expensive SANS. If the former's case it's often to help with benchmark numbers. In the latter's case, it's usually okay because the buffers are actually NVRAM, and so are safe across power cycles. There are also some USB-to-SATA chipsets that don't handle flush commands and simply ACK them without passing them to the drive, so yanking a drive can cause problems.
:
:There has been quite a bit of discussion on the zfs-discuss list on this topic of the years, especially when it comes to (consumer) SSDs.
Remember also that numerous ZFS studies have been debunked in recent
years, though I agree with the idea that going that extra mile requires
not trusting anything. In many respects ZFS's biggest enemy now is
bugs in ZFS itself (or the OS it runs under), and not so much glitches
in the underlying storage framework.
I am unaware of *ANY* mainstream hard drive or SSD made in the last
10 years which ignores the disk flush command. In previous decades HD
vendors played games with caching all the time but there are fewer
HD vendors now and they all compete heavily with each other... they
don't play those games any more for fear of losing their reputation.
There is very little vendor loyalty in the hard drive business.
When it comes to SSDs there are all sorts of fringe vendors, and I
certainly would not trust any of those, but if you stick to
well known vendors like Intel or OCZ it will work. Look for who's
chipsets are under the hood more than for whos name is slapped onto
the SSD and get as close to the source as you can.
Most current-day disk flush command issues are at a higher level. For
example, numerous VMs ignore the command (don't even bother to fsync()
the underlying block devices or files!). There isn't anything you can
do about a VM other than complain about it to the vendor. I've been hit
by this precisely issue running HAMMER inside a VM on a windows box.
If the VM blue-screen's the windows box (which happens quite often)
the data on-disk can wind up corrupted beyond all measure.
People who use VMs with direct-attached filesystems basically rely on
the host computer never crashing and should really have no expectation
of storage reliability short of running the VM inside an IBM mainframe.
That is the unfortunate truth.
With USB the primary culprit is virtually *all* USB/Firewire/SATA
bridges, as you noted, because I think there are only like 2 or 3
actual manufacturers and they are all broken. The USB standard itself
shares the blame for this. It is a really horrible standard.
USB-sticks are the ones that typically either lock up or return
success but don't actually flush their (fortunately limited) caches.
Nobody in their right mind uses USB to attach a disk when reliability
is important. It's fine to have it... I have lots of USB sticks and
a few USB-attached HDs lying around, but I have *ZERO* expectation of
reliability from them and neither should anyone else.
SD cards are in the same category as USB. Useful but untrustworthy.
Other fringe consumer crap, like fake-raid (BIOS-based RAID), is equally
unreliable when it comes to dealing with outright crashes. Always fun
to have drives which can't be moved to other machines if a mobo dies!
Not!
With network attached drives the standard itself is broken. It tries to
define command completion as the data being on-media which is stupid
when no other direct-attached standard requires that. Stupidity in
standards is a primary factor in vendors ignoring portions of standards.
In the case of network-attached drives implemented with direct-attached
drives on machines with software drivers to bridge to the network,
it comes down to whether the software deals with the flush command
properly, because it sure as hell isn't going to sync each write
command all the way to the media!
But frankly, none of these issues should stop anyone from not using
the command or rationalizing it away. Not that I am blaming anyone for
trying to rationalize it away, I am simply pointing out that in a
market as large as the generic 'storage' market is, there are always
going to be tons of broken stuff out there to avoid. It's buyer beware.
What we care about here, in this discussion, is direct-attached
SATA/eSATA/SAS, port multipliers and other external enclosure bridges,
high-end SCSI phys and, NVRAM aside (which is arguable), real RAID
hardware. And well-known vendors (fringe SSDs do not count). That
covers 90% of the market and 99% of the cases where protocol reliability
is required.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-stable
mailing list