bin/138043: fsck_ffs broken, partial patch

Fri Aug 21 18:10:02 UTC 2009

>Number:         138043
>Category:       bin
>Synopsis:       fsck_ffs broken, partial patch
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Aug 21 18:10:01 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Heikki Suonsivu
>Release:        FreeBSD 7.2-STABLE amd64
>Organization:
bbnetworks.net
>Environment:
System: FreeBSD news.bbnetworks.net 7.2-STABLE FreeBSD 7.2-STABLE #0: Thu Aug 13 22:42:05 EEST 2009 hsu at news.bbnetworks.net:/usr/obj/usr/src/sys/BBNETWORKS7NEWS amd64

	Possibly all versions of FreeBSD with UFS2.

>Description:

	fsck_ffs trusts value of used inodes in cylinder group header
	on UFS2 filesystems.  Unfortunately, a disk/memory corruption,
	for whatever reason, may corrupt that particular value.  If
	the corrupt value is too large, this is easy to detect, by
	comparing it to superblock max value.  If it is too low, bad
	things may still happen, as not all inodes are checked,
	possibly causing loss of files?  The patch below works around
	the too large case.  I think that the whole optimization of
	trusting cylinder group header is too optimistic, and the
	fsck_ffs should probably be returned to UFS1 way, even if
	there would be performance penalty.

>How-To-Repeat:

	Have your 3ware 9500S RAID controller go nuts when hotswapping
	a disk in the pack, and with apparent failure of the BBU which
	may or may not be related.  Alternatively, use any other flaky
	hardware and wait.  The server this happened on has ECC memory
	so this leaves the controller or the disks as most likely
	source.  This is probably not a very frequent event, if I am the
	first person ever to stumble upon this.

	The problem shows up with error "bad inode number xxx to
	nextinode", as getnextinode gets called with inumber beyond
	that particular cylinder group.  fsck_ffs exits in this case,
	making the filesystem inaccessible.  Forcibly mounting the
	said filesystem read-only generated immediate panic.  There
	was lots of corruption, most of it seemed to concentrate in a
	area around this cylinder group, with little damage elsewhere.

>Fix:

This is partial fix as it only fixes the detectable situation when
cgrp.cg_initediblk is larger than number of inodes per cylinder group.
Possibly better alternative is to return to UFS1 code in this case.

Should this mode be triggered on any strange things, such as redo the
pass1 in case number of inodes used mismatches in some way?  It would
catch too small value case?

Anyway, fsck should never ever exit nor it should make optimistic
assumptions of the disk state.  I did not analyze softupdates cases,
its already 3:45am...

The below fixes a mildly confusing error message as well.

Index: main.c
===================================================================
RCS file: /usr/CVS/src/sbin/fsck_ffs/main.c,v
retrieving revision 1.47.2.6
diff -u -r1.47.2.6 main.c

--- main.c	27 Apr 2009 19:15:14 -0000	1.47.2.6
+++ main.c	20 Aug 2009 19:38:41 -0000
@@ -412,7 +412,10 @@
 	 */
 	if (duplist) {
 		if (preen || usedsoftdep)
-			pfatal("INTERNAL ERROR: dups with -p");
+		  	pfatal("INTERNAL ERROR: dups with %s%s%s", 
+			       preen ? "-p" : "", 
+			       (preen && usedsoftdep) ? " and " : "",
+			       usedsoftdep ? "softupdates" : "");
 		printf("** Phase 1b - Rescan For More DUPS\n");
 		pass1b();
 	}
Index: pass1.c
===================================================================
RCS file: /usr/CVS/src/sbin/fsck_ffs/pass1.c,v
retrieving revision 1.43
diff -u -r1.43 pass1.c
--- pass1.c	8 Oct 2004 20:44:47 -0000	1.43
+++ pass1.c	21 Aug 2009 02:40:57 -0000
@@ -93,10 +93,20 @@
 		inumber = c * sblock.fs_ipg;
 		setinodebuf(inumber);
 		getblk(&cgblk, cgtod(&sblock, c), sblock.fs_cgsize);
-		if (sblock.fs_magic == FS_UFS2_MAGIC)
+		if (sblock.fs_magic == FS_UFS2_MAGIC) {
 			inosused = cgrp.cg_initediblk;
-		else
+			if (inosused > sblock.fs_ipg) {
+			  /* If cgrp.cg_initediblk is impossible, ignore it.
+			   * This may indicate a bigger problem? */
+			  pwarn("Garbled number of initialized inodes (%d > %d) in cylinder group %d\n", 
+				inosused, sblock.fs_ipg, c);
+			  /* Set the value to maximum per cylinder group,
+			   * like UFS1. */
+			  inosused = sblock.fs_ipg;
+			}
+		} else {
 			inosused = sblock.fs_ipg;
+		}
 		if (got_siginfo) {
 			printf("%s: phase 1: cyl group %d of %d (%d%%)\n",
 			    cdevname, c, sblock.fs_ncg,



>Release-Note:
>Audit-Trail:
>Unformatted: