kern/144824: boot problem on USB (root partition mounting)

Gilles Blanc gblanc at linagora.com
Wed Mar 17 16:50:02 UTC 2010


>Number:         144824
>Category:       kern
>Synopsis:       boot problem on USB (root partition mounting)
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Mar 17 16:50:01 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Gilles Blanc
>Release:        8.0-RELEASE (current)
>Organization:
Linagora
>Environment:
FreeBSD freedaemon.par.lng 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009     root at mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
The current system on boot (file /sys/kern/vfs_mount.c) uses a queue to wait for devices to be initialized before mounting root (or try to do so). This queue is filled for instance by usb driver (using "root_mount_hold" function), so if we boot on a USB key, the function "root_mount_prepare" holds the root mount time until USB is available (that is to say the queue has be emptied by using "root_mount_rel" on all the identifiers filled by the usb driver).

Actually, it only waits for USB to be "physically" available, but not necessarily umass or scsi (scsi-da). To be more precise, the system is not deterministic, because to be mounted, a root partition on a USB key needs both umass then scsi to be initialized, and if most of the time the mount process works, it is because the 'root_holds' list is not empty, and threads are running concurrently (for example one have wired a usb key on usb0, the system sequentially initializes usb0 to usb7, and during that time, umass0 and da0 are initialized too).

Unfortunately, some servers are not that kind, and root mounting just fails ('vfs_mountroot' function asks to 'vfs_mountroot_try' to mount USB root partition, which is not yet available), so we are in a situation where the "ROOT MOUNT ERROR" prompt appears, to mount our partition by hand, which is not very acceptable on production servers (we would have to go some kilometers just to type "ufs:/dev/da0s1a" each time we reboot...).

The problem is not blocking for most of FreeBSD users, but it prevents us to migrate our systems (which is quite a big problem).
>How-To-Repeat:
If you have a machine presenting this problem, you can repeat it easily (it fails 95% of the time) ; if not (like in my development laptop), you will never succeed to fail.
>Fix:
I have tried to add locks in umass and scsi drivers. In umass driver, it is in the /sys/dev/usb/storage/umass.c file, in function 'umass_attach' (in our supermicro server, umass has enough time to initialize, but I have been rigorous). In scsi driver, it is in the /sys/cam/scsi/scsi_da.c file, in function 'dastart', part "DA_STATE_PROBE2" of the switch/case. Unfortunately, between this two pairs of locking/unlocking, the root mounting thread preempts and as the list is empty during this very short time, it tries to mount root partition and fails as usual. It is not possible to add a lock in umass and remove it in scsi, because of the API which works with pointers on the lock list at the removal.

So another solution has to be considered, that is what I propose with this patch. Simply, in the vfs_mountroot_try, I try several times, with a little pause between, to call the 'kernel_mount' function. The number of trials is 3 by default, but can be customized through the new "vfs.root.mounttrymax" option in /boot/loader.conf (even set to 0, if we want to go back to the initial behavior). Each time the mount process fails and we can retry, a message appears, the thread sleeps for one second, and then try again. If it is really impossible to mount root, then we continue in the normal process of prompt.

Actually, there is still some problems on some USB ports (the other ones on the same machine work great at the first or second mounting retrial). I suspect a deeper problem in 'kernel_mount', because using the prompt doesn't mount the device, or worse can lead to page fault or locking. But my patch is enough to resolve the original problem as far as it is possible in the state of things.

I hope it will be reviewed and accepted as soon as possible.

Patch attached with submission follows:

--- vfs_mount.c	2010-03-17 15:30:45.000000000 +0100
+++ vfs_mount.c	2010-03-17 14:49:52.000000000 +0100
@@ -1798,6 +1806,8 @@
 	int		error;
 	char		patt[32];
 	char		errmsg[255];
+	char		nbtry;
+	int		rootmounttrymax;
 
 	vfsname = NULL;
 	path    = NULL;
@@ -1805,6 +1815,8 @@
 	ma	= NULL;
 	error   = EINVAL;
 	bzero(errmsg, sizeof(errmsg));
+	nbtry	= 0;
+	rootmounttrymax = 3;
 
 	if (mountfrom == NULL)
 		return (error);		/* don't complain */
@@ -1827,7 +1839,18 @@
 	ma = mount_arg(ma, "errmsg", errmsg, sizeof(errmsg));
 	ma = mount_arg(ma, "ro", NULL, 0);
 	ma = parse_mountroot_options(ma, options);
-	error = kernel_mount(ma, MNT_ROOTFS);
+
+	TUNABLE_INT_FETCH("vfs.root.mounttrymax", &rootmounttrymax);
+	while (1) {
+		error = kernel_mount(ma, MNT_ROOTFS);
+		if (nbtry < rootmounttrymax && error != 0) {
+			printf("Mount failed, retrying mount root from %s\n", mountfrom);
+			tsleep(&rootmounttrymax, PZERO | PDROP, "mount", hz);
+			nbtry++;
+		}
+		else
+			break;
+	}
 
 	if (error == 0) {
 		/*


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list