Data loss after power out - fsck: bad inode number to nextinode

Polytropon freebsd at edvax.de
Wed Jul 9 02:32:21 UTC 2008


Hi,

since last week I'm in big trouble: After an power outage my main
system didn't boot up anymore, so I checked its hard disk (FreeBSD
5.4) in my new system (FreeBSD 7.0).

I booted the system in SUM and ran fsck on the partitions. / on
/dev/ad1s1a could be repaired, /var on 1d too, /usr on 1e lost
many directory entries (X11R6, for exmaple), but all files and
directory entry points got restored to lost+found. Okay, that's
as I know it should be. But it doesn't matter, because everything
there could be reinstalled.

Problems occured when checking /home on /dev/ad1s1f. After lot
of

	1101472 DUP I=260035
	UNEXPECTED SOFT UPDATE INCONSISTENCY

and

	EXCESSIVE DUP BLKS I=260039
	CONTINUE? yes

and

	7310315658325879925 BAD I=260051
	UNEXPECTED SOFT UPDATE INCONSISTENCY

fsck ended up this way:

	INCORRECT BLOCK COUNT I=290557 (3104 should be 736)
	CORRECT? yes

	fsck_4.2bsd: bad inode number 306176 to nextinode

The result: The home directories of all other users where present,
but mine (!) - /home/adec - was missing. I may explain this a bit
more precise: When looking at the files using the Midnight Commander,
the name of my home directory was displayed, preceeded by "?", and
in red colour, with a strange date (the epoch?).

	|?adec            |      0|Jan  1  1970|

So I could not change into this directory and get my files out
of there.

In order not to damage the system more, I made a ddrescue dump
of the partition:

	% ddrescue -d -r 3 -n /dev/ad1s1f home.ddrescue logfile

The data could be read without problems. The resulting file seemed
to be an 1:1 copy of the partition.

% file home.ddrescue
home.ddrescue: Unix Fast File system [v2] (little-endian) last mounted on /mnt,
        last written at Wed Jul  2 18:51:06 2008,
        clean flag 0,
        readonly flag 0,
        number of blocks 44322272,
        number of data blocks 42925108,
        number of cylinder groups 472,
        block size 16384,
        fragment size 2048,
        average file size 16384,
        average number of files in dir 64,
        pending blocks to free 0,
        pending inodes to free 0,
        system-wide uuid 0,
        minimum percentage of free blocks 8,
        TIME optimization

When checking it with

	% fsck -t ufs -yf /dev/md10

fsck gives the same error message as above.

Then I mounted the image:

	% sudo mdconfig -a -t vnode -u 10 -f home.ddrescue
	% mount -t ufs -o ro /dev/md10 mnt

And guess what? Same problem: Directory name shown, but directory
not changable.

But then, I noticed something interesting:

	% df -h
	Filesystem     Size    Used   Avail Capacity  Mounted on
	/dev/md10       82G     75G    716M    99%    /export/home/adec/rescue/mnt

See the size differences? Something seems to be missing. I hope it
is the content of my home directory that's still on the disk. Some
checking:

	% sudo du -sch mnt
	du: mnt/adec: Bad file descriptor
	du: mnt/archiv/cr/clips.w32/s01.wmv: Bad file descriptor
	du: mnt/archiv/cr/clips.w32/s02.wmv: Bad file descriptor
	 52G    mnt
	 52G    total

This reveals that it seems to be possible that approx. 30 GB are
not marked as free.

	% file mnt/adec
	mnt/adec: cannot open `mnt/adec' (Bad file descriptor)

	% cd mnt/adec
	mnt/adec: Not a directory.

Before bothering anyone here at this list, I checked information on
the net and found that only one (!!!) person except me seemd to have
this problem. And he got no help. Do I? =^_^=

Of course I took the time to read about the FFS architecture. If I did
understand it correctly, fsck stops working, showing the informative
error message "bad inode number 306176 to nextinode" because it cannot
get the next inode from a concatenated list that represents the file
and directory hierarchy, so there must be a "bad pointer". While the
names of the next things represented by inodes reside within a data
structure at level N, the corresponting data entries reside at level
N + 1 where a pointer should lead to. This may be an explaination why
the name "adec" is still in ad1s1f's root directory, but the data that
says "I'm a directory, this is my content" is not referenced anymore.
So fsck cannot continue. The missing inodes need to get reconnected.
In most cases, that's what lost+found usually contains: unreferenced
inodes that are not marked free: their names are gone (N), but their
content is still there (N + 1), and the new file name is "#" plus
their inode number.

What should I do?

Help is VERY welcome! If you have any ideas what to do, I'd be glad
to save the money I would have to spend when sending the disk to a
data recovery service - 1000 Euro and more are nothing I can afford.
And when you're low on money, adequate tape backup systems are too
expensive (allthoug such a device would be my first choice).

By the way, this must be the revenge of a higher instance. I always
talk about backups, but because everything works fine for years, I
got lazy... I'm a long time happy FreeBSD user and I newer saw this
kind of problem. My whole existance is connected to my home directory.
Yes, it is that hard for me... please help!



Thanks!


More information about the freebsd-questions mailing list