Post 9.1 stable file system problems

Konstantin Belousov kostikbel at gmail.com
Tue Jan 1 15:58:14 UTC 2013


On Tue, Jan 01, 2013 at 02:39:44PM +0100, Dominic Fandrey wrote:
> On 01/01/2013 07:51, Konstantin Belousov wrote:
> > On Tue, Jan 01, 2013 at 02:05:11AM +0100, Dominic Fandrey wrote:
> >> On 01/01/2013 01:49, Dominic Fandrey wrote:
> >>> On 01/01/2013 01:29, Chris Rees wrote:
> >>>> On 1 Jan 2013 00:01, "Dominic Fandrey" <kamikaze at bsdforen.de> wrote:
> >>>>>
> >>>>> I have a Tinderbox that I just updated to the current RELENG_9.
> >>>>> Following the update build times for packages have increased by a
> >>>>> factor between 5 and 20. I.e. I have packages that used to build in
> >>>>> 5 minutes and now take an hour.
> >>>>>
> >>>>> I'm suspecting the file system ever since I saw that the majority of CPU
> >>>>> load was caused by ls when I looked at top (more than 2 minutes of CPU
> >>>>> time were counted that moment). The majority of the time most of the CPU
> >>>>> load is caused by bsdtar, pkg_add, qmake-qt4, etc. Without exception
> >>>>> tools that access a lot of files.
> >>>>>
> >>>>> The file system on which packages are built is nullfs mounted from
> >>>>> an async mounted UFS. I turned async off, to no avail.
> >>>>>
> >>>>> /usr/src/UPDATING says that there were nullfs optimisations. So I
> >>>>> think this is where the problem originates. I might hack the tinderbox to
> >>>>> use 'ln -s' or set it up for NFS to verify this.
> >>>>
> >>>> Is your kernel newer than the Jail?  The converse causes problems.
> >>>
> >>> I ran makeJail for all jails after updating.
Did you rebuild your modules together with the new kernel ?

> >>>
> >>> I also seem to have similar problems when building in the host-system.
> >>> The unzip for openjdk-7 has just passed the 11 minutes CPU time mark.
> >>> On my notebook it takes less than 10 seconds.
> >>
> >> Just set WRKOBJDIRPREFIX to a tmpfs on the Tinderbox host system
> >> and the extract takes less than a second. Originally WRKOBJDIRPREFIX
> >> also pointed to a nullfs mount.
> >>
> >> Afterwards I pointed WRKOBJDIRPREFIX to a UFS file system (without
> >> nullfs involvement). The entire make extract took 20s.
> >>
> >> So still faster by at least factor 30 than running it on a nullfs mount
> >> (I eventually SIGINTed so I don't know how long it would've run).
> > 
> > Start providing some useful debugging information ?
> 
> That one might be interesting. It's all system time:
> 
> # time -lh make extract
> ===>  License GPLv2 accepted by the user
> ===>  Found saved configuration for openjdk-7.9.05_1
> ===>  Extracting for openjdk-7.9.05_2
> => SHA256 Checksum OK for openjdk-7u6-fcs-src-b24-09_aug_2012.zip.
> => SHA256 Checksum OK for apache-ant-1.8.4-bin.zip.
> ===>   openjdk-7.9.05_2 depends on file: /usr/local/bin/unzip - found
> ^Ctime: command terminated abnormally
>         4m29.30s real           3.03s user              4m22.55s sys
>       5008  maximum resident set size
>        135  average shared memory size
>       2932  average unshared data size
>        127  average unshared stack size
>       7772  page reclaims
>          0  page faults
>          0  swaps
>         19  block input operations
>        101  block output operations
>          0  messages sent
>          0  messages received
>         41  signals received
>       1597  voluntary context switches
>      16590  involuntary context switches

Ok, from your mount -v output, are the three nullfs mounts the only
nullfs mount ever used ?

Is it only unzip which demostrates the silly behaviour ? Or does it
happen with any program ? E.g., does ls(1) or sha1 on the nullfs mount
also slow ?

Could you try some low-tech profiling on the slow program. For instance,
you could run ktrace/kdump -R to see which syscalls are slow.

Most darkly part of your report for me, is that I also use nullfs-backed
jails both on HEAD and stable/9, with bigger scale, and I do not have
an issue. I just did
pooma32% time unzip -q /usr/local/arch/freebsd/distfiles/openjdk-7u6-fcs-src-b24-09_aug_2012.zip
unzip -q   3.25s user 23.77s system 78% cpu 34.482 total
over nullfs mount of
/usr/home on /usr/sfw/local8/opt/pooma32/usr/home (nullfs, local).

Please try the following patch, which changes nullfs behaviour to be
non-cached by default. You could turn on the caching with the 'mount -t
nullfs -o cache from to' mounting command. I am interested if use/non-use
of -o cache makes a difference for you.

diff --git a/sbin/mount_nullfs/mount_nullfs.c b/sbin/mount_nullfs/mount_nullfs.c
index c88db3d..aaf66e5 100644
--- a/sbin/mount_nullfs/mount_nullfs.c
+++ b/sbin/mount_nullfs/mount_nullfs.c
@@ -57,27 +57,35 @@ static const char rcsid[] =
 
 #include "mntopts.h"
 
-static struct mntopt mopts[] = {
-	MOPT_STDOPTS,
-	MOPT_END
-};
-
 int	subdir(const char *, const char *);
 static void	usage(void) __dead2;
 
 int
 main(int argc, char *argv[])
 {
-	struct iovec iov[6];
-	int ch, mntflags;
+	struct iovec *iov;
+	char *p, *val;
 	char source[MAXPATHLEN];
 	char target[MAXPATHLEN];
+	char errmsg[255];
+	int ch, mntflags, iovlen;
+	char nullfs[] = "nullfs";
 
+	iov = NULL;
+	iovlen = 0;
 	mntflags = 0;
+	errmsg[0] = '\0';
 	while ((ch = getopt(argc, argv, "o:")) != -1)
 		switch(ch) {
 		case 'o':
-			getmntopts(optarg, mopts, &mntflags, 0);
+			val = strdup("");
+			p = strchr(optarg, '=');
+			if (p != NULL) {
+				free(val);
+				*p = '\0';
+				val = p + 1;
+			}
+			build_iovec(&iov, &iovlen, optarg, val, (size_t)-1);
 			break;
 		case '?':
 		default:
@@ -99,21 +107,16 @@ main(int argc, char *argv[])
 		errx(EX_USAGE, "%s (%s) and %s are not distinct paths",
 		    argv[0], target, argv[1]);
 
-	iov[0].iov_base = strdup("fstype");
-	iov[0].iov_len = sizeof("fstype");
-	iov[1].iov_base = strdup("nullfs");
-	iov[1].iov_len = strlen(iov[1].iov_base) + 1;
-	iov[2].iov_base = strdup("fspath");
-	iov[2].iov_len = sizeof("fspath");
-	iov[3].iov_base = source;
-	iov[3].iov_len = strlen(source) + 1;
-	iov[4].iov_base = strdup("target");
-	iov[4].iov_len = sizeof("target");
-	iov[5].iov_base = target;
-	iov[5].iov_len = strlen(target) + 1;
-
-	if (nmount(iov, 6, mntflags))
-		err(1, NULL);
+	build_iovec(&iov, &iovlen, "fstype", nullfs, (size_t)-1);
+	build_iovec(&iov, &iovlen, "fspath", source, (size_t)-1);
+	build_iovec(&iov, &iovlen, "target", target, (size_t)-1);
+	build_iovec(&iov, &iovlen, "errmsg", errmsg, sizeof(errmsg));
+	if (nmount(iov, iovlen, mntflags) < 0) {
+		if (errmsg[0] != 0)
+			err(1, "%s: %s", source, errmsg);
+		else
+			err(1, "%s", source);
+	}
 	exit(0);
 }
 
diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h
index 0878e55..4f37020 100644
--- a/sys/fs/nullfs/null.h
+++ b/sys/fs/nullfs/null.h
@@ -34,9 +34,15 @@
  * $FreeBSD$
  */
 
+#ifndef	FS_NULL_H
+#define	FS_NULL_H
+
+#define	NULLM_CACHE	0x0001
+
 struct null_mount {
 	struct mount	*nullm_vfs;
 	struct vnode	*nullm_rootvp;	/* Reference to root null_node */
+	uint64_t	nullm_flags;
 };
 
 #ifdef _KERNEL
@@ -80,3 +86,5 @@ MALLOC_DECLARE(M_NULLFSNODE);
 #endif /* NULLFS_DEBUG */
 
 #endif /* _KERNEL */
+
+#endif
diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c
index b2c7a75..f82d738 100644
--- a/sys/fs/nullfs/null_subr.c
+++ b/sys/fs/nullfs/null_subr.c
@@ -224,6 +224,9 @@ null_nodeget(mp, lowervp, vpp)
 	 * provide ready to use vnode.
 	 */
 	if (VOP_ISLOCKED(lowervp) != LK_EXCLUSIVE) {
+		KASSERT((MOUNTTONULLMOUNT(mp)->nullm_flags & NULLM_CACHE) == 0,
+		    ("lowervp %p is not excl locked and cache is disabled",
+		    lowervp));
 		vn_lock(lowervp, LK_UPGRADE | LK_RETRY);
 		if ((lowervp->v_iflag & VI_DOOMED) != 0) {
 			vput(lowervp);
diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
index 7d84d51..8a5f1b9 100644
--- a/sys/fs/nullfs/null_vfsops.c
+++ b/sys/fs/nullfs/null_vfsops.c
@@ -67,6 +67,13 @@ static vfs_vget_t	nullfs_vget;
 static vfs_extattrctl_t	nullfs_extattrctl;
 static vfs_reclaim_lowervp_t nullfs_reclaim_lowervp;
 
+/* Mount options that we support. */
+static const char *nullfs_opts[] = {
+	"target",
+	"cache",
+	NULL
+};
+
 /*
  * Mount null layer
  */
@@ -86,9 +93,11 @@ nullfs_mount(struct mount *mp)
 
 	if (!prison_allow(td->td_ucred, PR_ALLOW_MOUNT_NULLFS))
 		return (EPERM);
-
 	if (mp->mnt_flag & MNT_ROOTFS)
 		return (EOPNOTSUPP);
+	if (vfs_filteropt(mp->mnt_optnew, nullfs_opts))
+		return (EINVAL);
+
 	/*
 	 * Update is a no-op
 	 */
@@ -149,7 +158,7 @@ nullfs_mount(struct mount *mp)
 	}
 
 	xmp = (struct null_mount *) malloc(sizeof(struct null_mount),
-	    M_NULLFSMNT, M_WAITOK);
+	    M_NULLFSMNT, M_WAITOK | M_ZERO);
 
 	/*
 	 * Save reference to underlying FS
@@ -187,16 +196,25 @@ nullfs_mount(struct mount *mp)
 		mp->mnt_flag |= MNT_LOCAL;
 		MNT_IUNLOCK(mp);
 	}
+
+	vfs_flagopt(mp->mnt_optnew, "cache", &xmp->nullm_flags, NULLM_CACHE);
+
 	MNT_ILOCK(mp);
-	mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag &
-	    (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | MNTK_EXTENDED_SHARED);
+	if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
+		mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag &
+		    (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
+		    MNTK_EXTENDED_SHARED);
+	}
 	mp->mnt_kern_flag |= MNTK_LOOKUP_EXCL_DOTDOT;
 	MNT_IUNLOCK(mp);
 	mp->mnt_data = xmp;
 	vfs_getnewfsid(mp);
-	MNT_ILOCK(xmp->nullm_vfs);
-	TAILQ_INSERT_TAIL(&xmp->nullm_vfs->mnt_uppers, mp, mnt_upper_link);
-	MNT_IUNLOCK(xmp->nullm_vfs);
+	if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
+		MNT_ILOCK(xmp->nullm_vfs);
+		TAILQ_INSERT_TAIL(&xmp->nullm_vfs->mnt_uppers, mp,
+		    mnt_upper_link);
+		MNT_IUNLOCK(xmp->nullm_vfs);
+	}
 
 	vfs_mountedfrom(mp, target);
 
@@ -234,13 +252,15 @@ nullfs_unmount(mp, mntflags)
 	 */
 	mntdata = mp->mnt_data;
 	ump = mntdata->nullm_vfs;
-	MNT_ILOCK(ump);
-	while ((ump->mnt_kern_flag & MNTK_VGONE_UPPER) != 0) {
-		ump->mnt_kern_flag |= MNTK_VGONE_WAITER;
-		msleep(&ump->mnt_uppers, &ump->mnt_mtx, 0, "vgnupw", 0);
+	if ((mntdata->nullm_flags & NULLM_CACHE) != 0) {
+		MNT_ILOCK(ump);
+		while ((ump->mnt_kern_flag & MNTK_VGONE_UPPER) != 0) {
+			ump->mnt_kern_flag |= MNTK_VGONE_WAITER;
+			msleep(&ump->mnt_uppers, &ump->mnt_mtx, 0, "vgnupw", 0);
+		}
+		TAILQ_REMOVE(&ump->mnt_uppers, mp, mnt_upper_link);
+		MNT_IUNLOCK(ump);
 	}
-	TAILQ_REMOVE(&ump->mnt_uppers, mp, mnt_upper_link);
-	MNT_IUNLOCK(ump);
 	mp->mnt_data = NULL;
 	free(mntdata, M_NULLFSMNT);
 	return (0);
diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c
index f530ed2..cc35d81 100644
--- a/sys/fs/nullfs/null_vnops.c
+++ b/sys/fs/nullfs/null_vnops.c
@@ -692,7 +692,22 @@ null_unlock(struct vop_unlock_args *ap)
 static int
 null_inactive(struct vop_inactive_args *ap __unused)
 {
+	struct vnode *vp;
+	struct mount *mp;
+	struct null_mount *xmp;
 
+	vp = ap->a_vp;
+	mp = vp->v_mount;
+	xmp = MOUNTTONULLMOUNT(mp);
+	if ((xmp->nullm_flags & NULLM_CACHE) == 0) {
+		/*
+		 * If this is the last reference and caching of the
+		 * nullfs vnodes is not enabled, then free up the
+		 * vnode so as not to tie up the lower vnodes.
+		 */
+		vp->v_object = NULL;
+		vrecycle(vp);
+	}
 	return (0);
 }
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130101/7c0f6816/attachment.sig>


More information about the freebsd-stable mailing list