UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
Jeremy Chadwick
koitsu at FreeBSD.org
Sat Sep 27 11:03:31 UTC 2008
On Sat, Sep 27, 2008 at 12:37:50AM -0700, Derek Kuli??ski wrote:
> Friday, September 26, 2008, 11:44:17 PM, you wrote:
>
> >> As far as I know (at least ideally, when write caching is disabled)
>
> > Re: write caching: wheelies and burn-outs in empty parking lots
> > detected.
>
> > Let's be realistic. We're talking about ATA and SATA hard disks, hooked
> > up to on-board controllers -- these are the majority of users. Those
> > with ATA/SATA RAID controllers (not on-board RAID either; most/all of
> > those do not let you disable drive write caching) *might* have a RAID
> > BIOS menu item for disabling said feature.
>
> > FreeBSD atacontrol does not let you toggle such features (although "cap"
> > will show you if feature is available and if it's enabled or not).
>
> > Users using SCSI will most definitely have the ability to disable
> > said feature (either via SCSI BIOS or via camcontrol). But the majority
> > of users are not using SCSI disks, because the majority of users are not
> > going to spend hundreds of dollars on a controller followed by hundreds
> > of dollars for a small (~74GB) disk.
>
> > Regardless of all of this, end-users should, in no way shape or form,
> > be expected to go to great lengths to disable their disk's write cache.
> > They will not, I can assure you. Thus, we must assume: write caching
> > on a disk will be enabled, period. If a filesystem is engineered with
> > that fact ignored, then the filesystem is either 1) worthless, or 2)
> > serves a very niche purpose and should not be the default filesystem.
>
> > Do we agree?
>
> Yes, but...
>
> In the link you sent to me, someone mentioned that write cache is
> always creates problem, and it doesn't matter on OS or filesystem.
>
> There's more below.
>
> >> the data should always be consistent, and all fsck supposed to be
> >> doing is to free unreferenced blocks that were allocated.
> > fsck does a heck of a lot more than that, and there's no guarantee
> > that's all fsck is going to do on a UFS2+SU filesystem. I'm under the
> > impression it does a lot more than just looking for unref'd blocks.
>
> Yes, fsck does a lot more than that. But the whole point of soft
> updates is to reduce the work of fsck to deallocate allocated blocks.
>
> Anyway, maybe my information are invalid, though funny thing is that
> Soft Updates was mentioned in one of my lecture on Operating Systems.
>
> Apparently the goal of Soft Updates is to always enforce those rules
> in very efficient manner, by reordering the writes:
> 1. Never point to a data structure before initializing it
> 2. Never reuse a structure before nullifying pointers to it
> 3. Never reset last pointer to live structure before setting a new one
> 4. Always mark free-block bitmap entries as used before making the
> directory entry point to it
>
> The problem comes with disks which for performance reasons cache the
> data and then write it in different order back to the disk.
> I think that's the reason why it's recommended to disable it.
> If a disk is reordering the writes, it renders the soft updates
> useless.
>
> But if the writing order is preserved, all data remains always
> consistent, the only thing that might appear are blocks that were
> marked as being used, but nothing was pointing to them yet.
>
> So (in ideal situation, when nothing interferes) all fsck needs to do
> is just to scan the filesystem and deallocate those blocks.
>
> > The system is already up and the filesystems mounted. If the error in
> > question is of such severity that it would impact a user's ability to
> > reliably use the filesystem, how do you expect constant screaming on
> > the console will help? A user won't know what it means; there is
> > already evidence of this happening (re: mysterious ATA DMA errors which
> > still cannot be figured out[6]).
>
> > IMHO, a dirty filesystem should not be mounted until it's been fully
> > analysed/scanned by fsck. So again, people are putting faith into
> > UFS2+SU despite actual evidence proving that it doesn't handle all
> > scenarios.
>
> Yes, I think the background fsck should be disabled by default, with a
> possibility to enable it if the user is sure that nothing will
> interfere with soft updates.
>
> > The problem here is that when it was created, it was sort of an
> > "experiment". Now, when someone installs FreeBSD, UFS2 is the default
> > filesystem used, and SU are enabled on every filesystem except the root
> > fs. Thus, we have now put ourselves into a situation where said
> > feature ***must*** be reliable in all cases.
>
> I think in worst case it just is as realiable as if it wouldn't be
> enabled (the only danger is the background fsck)
>
> > You're also forgetting a huge focus of SU -- snapshots[1]. However, there
> > are more than enough facts on the table at this point concluding that
> > snapshots are causing more problems[7] than previously expected. And
> > there's further evidence filesystem snapshots shouldn't even be used in
> > this way[8].
>
> there's not much to argue about that.
>
> >> Also, if I remember correctly, PJD said that gjournal is performing
> >> much better with small files, while softupdates is faster with big
> >> ones.
>
> > Okay, so now we want to talk about benchmarks. The benchmarks you're
> > talking about are in two places[2][3].
>
> > The benchmarks pjd@ provided were very basic/simple, which I feel is
> > good, because the tests were realistic (common tasks people will do).
> > The benchmarks mckusick@ provided for UFS2+SU were based on SCSI
> > disks, which is... interesting to say the least.
>
> > Bruce Evans responded with some more data[4].
>
> > I particularly enjoy this quote in his benchmark: "I never found the
> > exact cause of the slower readback ...", followed by (plausible)
> > speculations as to why that is.
>
> > I'm sorry that I sound like such a hard-ass on this matter, but there is
> > a glaring fact that people seem to be overlooking intentionally:
>
> > Filesystems have to be reliable; data integrity is focus #1, and cannot
> > be sacrificed. Users and administrators *expect* a filesystem to be
> > reliable. No one is going to keep using a filesystem if it has
> > disadvantages which can result in data loss or "waste of administrative
> > time" (which I believe is what's occurring here).
>
> > Users *will* switch to another operating system that has filesystems
> > which were not engineered/invented with these features in mind. Or,
> > they can switch to another filesystem assuming the OS offers one which
> > performs equally as good/well and is guaranteed to be reliable --
> > and that's assuming the user wants to spend the time to reformat and
> > reinstall just to get that.
>
> I wasn't trying to argue about that. Perhaps my assumption is wrong,
> but I belive that the problems that we know about Soft Updates, at
> worst case make system as reliable as it was without using it.
>
> > In the case of "bit rot" (e.g. drive cache going bad silently, bad
> > cables, or other forms of low-level data corruption), a filesystem is
> > likely not to be able to cope with this (but see below).
>
> > A common rebuttal here would be: "so use UFS2 without soft updates".
> > Excellent advice! I might consider it myself! But the problem is that
> > we cannot expect users to do that. Why? Because the defaults chosen
> > during sysinstall are to use SU for all filesystems except root. If SU
> > is not reliable (or is "reliable in most cases" -- same thing if you ask
> > me), then it should not be enabled by default. I think we (FreeBSD)
> > might have been a bit hasty in deciding to choose that as a default.
>
> > Next: a system locking up (or a kernel panic) should result in a dirty
> > filesystem. That filesystem should be *fully recoverable* from that
> > kind of error, with no risk of data loss (but see below).
>
> > (There is the obvious case where a file is written to the disk, and the
> > disk has not completed writing the data from its internal cache to the
> > disk itself (re: write caching); if power is lost, the disk may not have
> > finished writing the cache to disk. In this case, the file is going to
> > be sparse -- there is absolutely nothing that can be done about this
> > with any filesystem, including ZFS (to my knowledge). This situation
> > is acceptable; nature of the beast.)
>
> > The filesystem should be fully analysed and any errors repaired (either
> > with user interaction or automatically -- I'm sure it depends on the
> > kind of error) **before** the filesystem is mounted.
>
> > This is where SU gets in the way. The filesystem is mounted and the
> > system is brought up + online 60 seconds before the fsck starts. The
> > assumption made is that the errors in question will be fully recoverable
> > by an automatic fsck, which as this thread proves, is not always the
> > case.
>
> That's why I think background fsck should be disabled by default.
> Though I still don't think that soft updates hurt anything (probably
> except performance)
>
> > ZFS is the first filesystem, to my knowledge, which provides 1) a
> > reliable filesystem, 2) detection of filesystem problems in real-time or
> > during scrubbing, 3) repair of problems in real-time (assuming raidz1 or
> > raidz2 are used), and 4) does not need fsck. This makes ZFS powerful.
>
> > "So use ZFS!" A good piece of advice -- however, I've already had
> > reports from users that they will not consider ZFS for FreeBSD at this
> > time. Why? Because ZFS on FreeBSD can panic the system easily due to
> > kmem exhaustion. Proper tuning can alleviate this problem, but users do
> > not want to to have to "tune" their system to get stability (and I feel
> > this is a very legitimate argument).
>
> > Additionally, FreeBSD doesn't offer ZFS as a filesystem during
> > installation. PC-BSD does, AFAIK. So on FreeBSD, you have to go
> > through a bunch of rigmarole[5] to get it to work (and doing this
> > after-the-fact is a real pain in the rear -- believe me, I did it this
> > weekend.)
>
> > So until both of these ZFS-oriented issues can be dealt with, some
> > users aren't considering it.
>
> > This is the reality of the situation. I don't think what users and
> > administrators want is unreasonable; they may be rough demands, but
> > that's how things are in this day and age.
>
> > Have I provided enough evidence? :-)
>
> Yes, but as far as I understand it's not as bad as you think :)
> I could be wrong though.
>
> I 100% agree on disabling background fsck, but I don't think soft
> updates are making the system any less reliable than it would be
> without it.
With regards to all you've said:
Thank you for these insights. Everything you and Erik have said has
been quite educational, and I greatly appreciate it. Always good to
learn from people who know more! :-)
I believe we're in overall agreement with regards to background_fsck
(should be disabled by default). I'd file a PR for this sort of thing,
but it almost seems like something that should go to the (private)
developers list for discussion first.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-stable
mailing list