kern/93942: panic: ufs_dirbad: bad dir
David Rhodus
drhodus at machdep.com
Wed Mar 1 12:20:14 PST 2006
The following reply was made to PR kern/93942; it has been noted by GNATS.
From: "David Rhodus" <drhodus at machdep.com>
To: Yarema <yds at coolrat.org>
Cc: FreeBSD-gnats-submit at freebsd.org, FreeBSD-current at freebsd.org,
"Kris Kennaway" <kris at obsecurity.org>,
"Dennis Koegel" <amf at hobbit.neveragain.de>,
"Doug White" <dwhite at gumbysoft.com>, "Martin Machacek" <m at m3a.net>,
"David O'Brien" <obrien at freebsd.org>,
"Scott Long" <scottl at samsco.org>,
"Pawel Jakub Dawidek" <pjd at freebsd.org>
Subject: Re: kern/93942: panic: ufs_dirbad: bad dir
Date: Wed, 1 Mar 2006 15:10:38 -0500
On 2/28/06, Yarema <yds at coolrat.org> wrote:
>
>
> --On February 28, 2006 2:53:43 PM -0500 Kris Kennaway <kris at obsecurity.or=
g>
> wrote:
>
> > On Tue, Feb 28, 2006 at 10:35:36AM -0500, Yarema wrote:
> >>
> >> > Number: 93942
> >> > Category: kern
> >> > Synopsis: panic: ufs_dirbad: bad dir
> >> > Confidential: no
> >> > Severity: critical
> >> > Priority: high
> >> > Responsible: freebsd-bugs
> >> > State: open
> >> > Quarter:
> >> > Keywords:
> >> > Date-Required:
> >> > Class: sw-bug
> >> > Submitter-Id: current-users
> >> > Arrival-Date: Tue Feb 28 15:40:06 GMT 2006
> >> > Closed-Date:
> >> > Last-Modified:
> >> > Originator: Yarema <yds at CoolRat.org>
> >> > Release: FreeBSD 6.1-PRERELEASE i386
> >> > Organization:
> >> > Environment:
> >> System: FreeBSD 6.1-PRERELEASE #0: Mon Feb 27 04:52:11 EST 2006 i386
> >>
> >> > Description:
> >>
> >> This is at least the third file system which got hosed for me by the
> >> ufs_dirbad bug on three different hard drives since 5.3 STABLE.
> >> I suspect this is related to the following PRs:
> >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D49079
> >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D51001
> >>
> >> In every case a process would lock up making the whole system
> >> unresponsive. A reboot, fsck -y in single user mode and another
> >> reboot would produce the following during the mount of the corrupt
> >> fs in rw mode:
> >>
> >> bad dir ino 2 at offset 16384: mangled entry
> >> panic: ufs_dirbad: bad dir
> >> cpuid =3D 0
> >>
> >> Another reboot, fsck -y in single user mode and reboot produces the
> >> same results repeatedly. Previously I had recovered by mounting the
> >> corrupt fs in ro mode, backup, newfs, restore.
> >>
> >> Recently I noticed Matthew Dillon commit the following to the
> >> DragonFly src repository:
> >>
> >> http://leaf.DragonFlyBSD.org/mailarchive/commits/2006-02/msg00057.html
> >>
> >> dillon 2006/02/21 10:46:56 PST
> >>
> >> DragonFly src repository
> >>
> >> Modified files:
> >> sys/kern vfs_cluster.c
> >> Log:
> >> bioops.io_start() was being called in a situation where the buffer
> >> could be brelse()'d afterwords instead of I/O being initiated. When
> >> this occurs, the buffer may contain softupdates-modified data which =
is
> >> never reverted, resulting in serious filesystem corruption. When
> >> io_start is called on a buffer, I/O MUST be initiated and terminated
> >> with a biodone() or the buffer's data may not be properly reverted.
> >>
> >> Solve the problem by moving the io_start() call a little further on =
in
> >> the code, after the potential brelse().
> >>
> >> There is a possibility that this bug is responsible for the 'dirbad'
> >> panics often reported in DragonFly and FreeBSD circles.
> >>
> >> Revision Changes Path
> >> 1.16 +7 -6 src/sys/kern/vfs_cluster.c
> >>
> >> http://www.DragonFlyBSD.org/cvsweb/src/sys/kern/vfs_cluster.c.diff?r1=
=3D1.
> >> 15&r2=3D1.16&f=3Du
> >>
> >> Below is the equivalent patch to the FreeBSD RELENG_6 branch of
> >> src/sys/kern/vfs_cluster.c
> >>
> >> Hope this helps track down the problem.
> >
> > Does it work for you? :)
> >
> > Kris
>
> No way for me to know yet. From what I gathered, mostly from this thread=
:
> <http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=3D331058+0+archive/2006/fre=
ebsd-current/20060108.freebsd-current>
>
> As per Matt Dillon
> <http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=3D217892+0+/usr/local/www/d=
b/text/2006/freebsd-current/20060226.freebsd-current>,
> the corruption occurs much earlier than any consequences can be felt.
> The patch may prevent the corruption from occurring in the first place.
> But the patch does nothing for me now that I have a huge /home slice
> which cannot even be mounted as read-only in single user mode without
> triggering a page fault kernel panic in the mount process no matter
> how many times I run fsck -f on it.
>
> FWIW the page fault in the mount process is a different sort of kernel
> panic than what is described in this kern/93942 PR above. The page fault
> occurs while attempting to mount read-only. Attempting to mount raed-wri=
te
> causes the panic: ufs_dirbad: bad dir
>
> One more note, hitting the power button when the machine is locked up
> before the reboot and mount attempt which causes the panic produces the
> following output every time the button is pressed:
>
> kernel: acpi: suspend request ignored (not ready yet)
>
> Seems like there's two separate problems:
> 1) the root cause of the bad dir corruption.
> 2) fsck -f doesn't fix it no matter how many times you run it.
>
> Any pointers on how to recover my /home slice will be greatly appreciated=
.
>
> --
> Yarema
I have been working with the bad dir problem for several months and I
have not had corruption which fsck would not correct.
-DR
More information about the freebsd-bugs
mailing list