Post 9.1 stable file system problems
Konstantin Belousov
kostikbel at gmail.com
Tue Jan 1 15:58:14 UTC 2013
On Tue, Jan 01, 2013 at 02:39:44PM +0100, Dominic Fandrey wrote:
> On 01/01/2013 07:51, Konstantin Belousov wrote:
> > On Tue, Jan 01, 2013 at 02:05:11AM +0100, Dominic Fandrey wrote:
> >> On 01/01/2013 01:49, Dominic Fandrey wrote:
> >>> On 01/01/2013 01:29, Chris Rees wrote:
> >>>> On 1 Jan 2013 00:01, "Dominic Fandrey" <kamikaze at bsdforen.de> wrote:
> >>>>>
> >>>>> I have a Tinderbox that I just updated to the current RELENG_9.
> >>>>> Following the update build times for packages have increased by a
> >>>>> factor between 5 and 20. I.e. I have packages that used to build in
> >>>>> 5 minutes and now take an hour.
> >>>>>
> >>>>> I'm suspecting the file system ever since I saw that the majority of CPU
> >>>>> load was caused by ls when I looked at top (more than 2 minutes of CPU
> >>>>> time were counted that moment). The majority of the time most of the CPU
> >>>>> load is caused by bsdtar, pkg_add, qmake-qt4, etc. Without exception
> >>>>> tools that access a lot of files.
> >>>>>
> >>>>> The file system on which packages are built is nullfs mounted from
> >>>>> an async mounted UFS. I turned async off, to no avail.
> >>>>>
> >>>>> /usr/src/UPDATING says that there were nullfs optimisations. So I
> >>>>> think this is where the problem originates. I might hack the tinderbox to
> >>>>> use 'ln -s' or set it up for NFS to verify this.
> >>>>
> >>>> Is your kernel newer than the Jail? The converse causes problems.
> >>>
> >>> I ran makeJail for all jails after updating.
Did you rebuild your modules together with the new kernel ?
> >>>
> >>> I also seem to have similar problems when building in the host-system.
> >>> The unzip for openjdk-7 has just passed the 11 minutes CPU time mark.
> >>> On my notebook it takes less than 10 seconds.
> >>
> >> Just set WRKOBJDIRPREFIX to a tmpfs on the Tinderbox host system
> >> and the extract takes less than a second. Originally WRKOBJDIRPREFIX
> >> also pointed to a nullfs mount.
> >>
> >> Afterwards I pointed WRKOBJDIRPREFIX to a UFS file system (without
> >> nullfs involvement). The entire make extract took 20s.
> >>
> >> So still faster by at least factor 30 than running it on a nullfs mount
> >> (I eventually SIGINTed so I don't know how long it would've run).
> >
> > Start providing some useful debugging information ?
>
> That one might be interesting. It's all system time:
>
> # time -lh make extract
> ===> License GPLv2 accepted by the user
> ===> Found saved configuration for openjdk-7.9.05_1
> ===> Extracting for openjdk-7.9.05_2
> => SHA256 Checksum OK for openjdk-7u6-fcs-src-b24-09_aug_2012.zip.
> => SHA256 Checksum OK for apache-ant-1.8.4-bin.zip.
> ===> openjdk-7.9.05_2 depends on file: /usr/local/bin/unzip - found
> ^Ctime: command terminated abnormally
> 4m29.30s real 3.03s user 4m22.55s sys
> 5008 maximum resident set size
> 135 average shared memory size
> 2932 average unshared data size
> 127 average unshared stack size
> 7772 page reclaims
> 0 page faults
> 0 swaps
> 19 block input operations
> 101 block output operations
> 0 messages sent
> 0 messages received
> 41 signals received
> 1597 voluntary context switches
> 16590 involuntary context switches
Ok, from your mount -v output, are the three nullfs mounts the only
nullfs mount ever used ?
Is it only unzip which demostrates the silly behaviour ? Or does it
happen with any program ? E.g., does ls(1) or sha1 on the nullfs mount
also slow ?
Could you try some low-tech profiling on the slow program. For instance,
you could run ktrace/kdump -R to see which syscalls are slow.
Most darkly part of your report for me, is that I also use nullfs-backed
jails both on HEAD and stable/9, with bigger scale, and I do not have
an issue. I just did
pooma32% time unzip -q /usr/local/arch/freebsd/distfiles/openjdk-7u6-fcs-src-b24-09_aug_2012.zip
unzip -q 3.25s user 23.77s system 78% cpu 34.482 total
over nullfs mount of
/usr/home on /usr/sfw/local8/opt/pooma32/usr/home (nullfs, local).
Please try the following patch, which changes nullfs behaviour to be
non-cached by default. You could turn on the caching with the 'mount -t
nullfs -o cache from to' mounting command. I am interested if use/non-use
of -o cache makes a difference for you.
diff --git a/sbin/mount_nullfs/mount_nullfs.c b/sbin/mount_nullfs/mount_nullfs.c
index c88db3d..aaf66e5 100644
--- a/sbin/mount_nullfs/mount_nullfs.c
+++ b/sbin/mount_nullfs/mount_nullfs.c
@@ -57,27 +57,35 @@ static const char rcsid[] =
#include "mntopts.h"
-static struct mntopt mopts[] = {
- MOPT_STDOPTS,
- MOPT_END
-};
-
int subdir(const char *, const char *);
static void usage(void) __dead2;
int
main(int argc, char *argv[])
{
- struct iovec iov[6];
- int ch, mntflags;
+ struct iovec *iov;
+ char *p, *val;
char source[MAXPATHLEN];
char target[MAXPATHLEN];
+ char errmsg[255];
+ int ch, mntflags, iovlen;
+ char nullfs[] = "nullfs";
+ iov = NULL;
+ iovlen = 0;
mntflags = 0;
+ errmsg[0] = '\0';
while ((ch = getopt(argc, argv, "o:")) != -1)
switch(ch) {
case 'o':
- getmntopts(optarg, mopts, &mntflags, 0);
+ val = strdup("");
+ p = strchr(optarg, '=');
+ if (p != NULL) {
+ free(val);
+ *p = '\0';
+ val = p + 1;
+ }
+ build_iovec(&iov, &iovlen, optarg, val, (size_t)-1);
break;
case '?':
default:
@@ -99,21 +107,16 @@ main(int argc, char *argv[])
errx(EX_USAGE, "%s (%s) and %s are not distinct paths",
argv[0], target, argv[1]);
- iov[0].iov_base = strdup("fstype");
- iov[0].iov_len = sizeof("fstype");
- iov[1].iov_base = strdup("nullfs");
- iov[1].iov_len = strlen(iov[1].iov_base) + 1;
- iov[2].iov_base = strdup("fspath");
- iov[2].iov_len = sizeof("fspath");
- iov[3].iov_base = source;
- iov[3].iov_len = strlen(source) + 1;
- iov[4].iov_base = strdup("target");
- iov[4].iov_len = sizeof("target");
- iov[5].iov_base = target;
- iov[5].iov_len = strlen(target) + 1;
-
- if (nmount(iov, 6, mntflags))
- err(1, NULL);
+ build_iovec(&iov, &iovlen, "fstype", nullfs, (size_t)-1);
+ build_iovec(&iov, &iovlen, "fspath", source, (size_t)-1);
+ build_iovec(&iov, &iovlen, "target", target, (size_t)-1);
+ build_iovec(&iov, &iovlen, "errmsg", errmsg, sizeof(errmsg));
+ if (nmount(iov, iovlen, mntflags) < 0) {
+ if (errmsg[0] != 0)
+ err(1, "%s: %s", source, errmsg);
+ else
+ err(1, "%s", source);
+ }
exit(0);
}
diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h
index 0878e55..4f37020 100644
--- a/sys/fs/nullfs/null.h
+++ b/sys/fs/nullfs/null.h
@@ -34,9 +34,15 @@
* $FreeBSD$
*/
+#ifndef FS_NULL_H
+#define FS_NULL_H
+
+#define NULLM_CACHE 0x0001
+
struct null_mount {
struct mount *nullm_vfs;
struct vnode *nullm_rootvp; /* Reference to root null_node */
+ uint64_t nullm_flags;
};
#ifdef _KERNEL
@@ -80,3 +86,5 @@ MALLOC_DECLARE(M_NULLFSNODE);
#endif /* NULLFS_DEBUG */
#endif /* _KERNEL */
+
+#endif
diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c
index b2c7a75..f82d738 100644
--- a/sys/fs/nullfs/null_subr.c
+++ b/sys/fs/nullfs/null_subr.c
@@ -224,6 +224,9 @@ null_nodeget(mp, lowervp, vpp)
* provide ready to use vnode.
*/
if (VOP_ISLOCKED(lowervp) != LK_EXCLUSIVE) {
+ KASSERT((MOUNTTONULLMOUNT(mp)->nullm_flags & NULLM_CACHE) == 0,
+ ("lowervp %p is not excl locked and cache is disabled",
+ lowervp));
vn_lock(lowervp, LK_UPGRADE | LK_RETRY);
if ((lowervp->v_iflag & VI_DOOMED) != 0) {
vput(lowervp);
diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
index 7d84d51..8a5f1b9 100644
--- a/sys/fs/nullfs/null_vfsops.c
+++ b/sys/fs/nullfs/null_vfsops.c
@@ -67,6 +67,13 @@ static vfs_vget_t nullfs_vget;
static vfs_extattrctl_t nullfs_extattrctl;
static vfs_reclaim_lowervp_t nullfs_reclaim_lowervp;
+/* Mount options that we support. */
+static const char *nullfs_opts[] = {
+ "target",
+ "cache",
+ NULL
+};
+
/*
* Mount null layer
*/
@@ -86,9 +93,11 @@ nullfs_mount(struct mount *mp)
if (!prison_allow(td->td_ucred, PR_ALLOW_MOUNT_NULLFS))
return (EPERM);
-
if (mp->mnt_flag & MNT_ROOTFS)
return (EOPNOTSUPP);
+ if (vfs_filteropt(mp->mnt_optnew, nullfs_opts))
+ return (EINVAL);
+
/*
* Update is a no-op
*/
@@ -149,7 +158,7 @@ nullfs_mount(struct mount *mp)
}
xmp = (struct null_mount *) malloc(sizeof(struct null_mount),
- M_NULLFSMNT, M_WAITOK);
+ M_NULLFSMNT, M_WAITOK | M_ZERO);
/*
* Save reference to underlying FS
@@ -187,16 +196,25 @@ nullfs_mount(struct mount *mp)
mp->mnt_flag |= MNT_LOCAL;
MNT_IUNLOCK(mp);
}
+
+ vfs_flagopt(mp->mnt_optnew, "cache", &xmp->nullm_flags, NULLM_CACHE);
+
MNT_ILOCK(mp);
- mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag &
- (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | MNTK_EXTENDED_SHARED);
+ if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
+ mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag &
+ (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
+ MNTK_EXTENDED_SHARED);
+ }
mp->mnt_kern_flag |= MNTK_LOOKUP_EXCL_DOTDOT;
MNT_IUNLOCK(mp);
mp->mnt_data = xmp;
vfs_getnewfsid(mp);
- MNT_ILOCK(xmp->nullm_vfs);
- TAILQ_INSERT_TAIL(&xmp->nullm_vfs->mnt_uppers, mp, mnt_upper_link);
- MNT_IUNLOCK(xmp->nullm_vfs);
+ if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
+ MNT_ILOCK(xmp->nullm_vfs);
+ TAILQ_INSERT_TAIL(&xmp->nullm_vfs->mnt_uppers, mp,
+ mnt_upper_link);
+ MNT_IUNLOCK(xmp->nullm_vfs);
+ }
vfs_mountedfrom(mp, target);
@@ -234,13 +252,15 @@ nullfs_unmount(mp, mntflags)
*/
mntdata = mp->mnt_data;
ump = mntdata->nullm_vfs;
- MNT_ILOCK(ump);
- while ((ump->mnt_kern_flag & MNTK_VGONE_UPPER) != 0) {
- ump->mnt_kern_flag |= MNTK_VGONE_WAITER;
- msleep(&ump->mnt_uppers, &ump->mnt_mtx, 0, "vgnupw", 0);
+ if ((mntdata->nullm_flags & NULLM_CACHE) != 0) {
+ MNT_ILOCK(ump);
+ while ((ump->mnt_kern_flag & MNTK_VGONE_UPPER) != 0) {
+ ump->mnt_kern_flag |= MNTK_VGONE_WAITER;
+ msleep(&ump->mnt_uppers, &ump->mnt_mtx, 0, "vgnupw", 0);
+ }
+ TAILQ_REMOVE(&ump->mnt_uppers, mp, mnt_upper_link);
+ MNT_IUNLOCK(ump);
}
- TAILQ_REMOVE(&ump->mnt_uppers, mp, mnt_upper_link);
- MNT_IUNLOCK(ump);
mp->mnt_data = NULL;
free(mntdata, M_NULLFSMNT);
return (0);
diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c
index f530ed2..cc35d81 100644
--- a/sys/fs/nullfs/null_vnops.c
+++ b/sys/fs/nullfs/null_vnops.c
@@ -692,7 +692,22 @@ null_unlock(struct vop_unlock_args *ap)
static int
null_inactive(struct vop_inactive_args *ap __unused)
{
+ struct vnode *vp;
+ struct mount *mp;
+ struct null_mount *xmp;
+ vp = ap->a_vp;
+ mp = vp->v_mount;
+ xmp = MOUNTTONULLMOUNT(mp);
+ if ((xmp->nullm_flags & NULLM_CACHE) == 0) {
+ /*
+ * If this is the last reference and caching of the
+ * nullfs vnodes is not enabled, then free up the
+ * vnode so as not to tie up the lower vnodes.
+ */
+ vp->v_object = NULL;
+ vrecycle(vp);
+ }
return (0);
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130101/7c0f6816/attachment.sig>
More information about the freebsd-stable
mailing list