Problems after recent nullfs,vfs changes in 10.0-CURRENT

b. f. bf1783 at googlemail.com
Wed Sep 26 11:38:44 UTC 2012


On 9/18/12, b. f. <bf1783 at googlemail.com> wrote:
> The following deals with some problems exposed by r240283-5,
> particularly (but not only) when used with changes to tmpfs that were
> first proposed by kib@ on 21 June 2010 on this list, in a thread
> entitled "Tmpfs elimination of double copy":
>
> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=20463+0+archive/2010/freebsd-fs/20100627.freebsd-fs
>
> On 9/18/12, Konstantin Belousov <kostikbel at gmail.com> wrote:
>> On Mon, Sep 17, 2012 at 08:36:54PM +0200, Peter Holm wrote:
>>> On Mon, Sep 17, 2012 at 03:19:25PM +0300, Konstantin Belousov wrote:
>>> > Please mail fs@, possibly Cc:-ing me.
>>> >
>>> > On Mon, Sep 17, 2012 at 03:04:46AM -0400, b. f. wrote:
>>> > > The recent nullfs or vfs changes (r240283-5) have exposed some
>>> > > problems with my tinderbox.  In this tinderbox, I've been using
>>> > > recent
>>> > > versions of -CURRENT with Gleb's tmpfs rbtree patch:
>>> > >
>>> > > http://people.freebsd.org/~gleb/tmpfs-nrbtree.1.patch
>>> > >
>>> > > and a merged version of your tmpfs single-buffer patch:
>>> > >
>>> > > http://people.freebsd.org/~kib/misc/tmpfs.12.patch
>>> > >
>>> > > The tinderbox performs builds in a tmpfs filesystem that is nullfs
>>> > > grafted to a ufs filesystem.  After r240283-5, builds of
>>> > > ports/lang/ocaml failed when a cp(1) of an executable failed with
>>> > > ETXTBSY. After reverting r240285, the builds of ocaml succeeded.
>>> > >
>>> > > I've attached logs of the failed and successful builds.  Can you
>>> > > guess
>>> > > whether the problem is solely due to the recent nullfs and vfs
>>> > > changes, or to some defect in Gleb's proposed changes, or to a
>>> > > problem
>>> > > with your proposed tmpfs change, or my merging of it?  What further
>>> > > changes or tests would you suggest to help find the source of the
>>> > > problem?
>>> > >
>>> > > I've attached a diff of the relevant changes to the system sources
>>> > > used in the tinderbox, and logs of the successful (*.log) and
>>> > > unsuccessful (*.log.error) ocaml builds.
>>> >
>>> > Please show me the mount -v output, and specify which filesystems
>>> > are used where.
>
> The following is a typical layout for one run of the tinderbox (which
> is in /home/shared/freebsd/tinderbox):
>
> /dev/ufs/d1root on / (ufs, local, noatime, writes: sync 13 async 25,
> reads: sync 553 async 42, fsid 8aabfa4d68614a9f)
> devfs on /dev (devfs, local, fsid 00ff007171000000)
> tmpfs on /tmp (tmpfs, local, nosuid, fsid 01ff008787000000)
> /dev/ufs/d1var on /var (ufs, local, noatime, journaled soft-updates,
> writes: sync 15 async 269, reads: sync 664 async 12, fsid
> a5abfa4d331091c9)
> /dev/ufs/d1usr on /usr (ufs, local, noatime, journaled soft-updates,
> writes: sync 2 async 0, reads: sync 765 async 12, fsid
> b4abfa4d94c0f782)
> /dev/ufs/d1usrlocal on /usr/local (ufs, local, noatime, journaled
> soft-updates, writes: sync 32 async 298, reads: sync 2867 async 106,
> fsid c4abfa4d96ab4351)
> /dev/ufs/d1home on /home (ufs, local, noatime, journaled soft-updates,
> writes: sync 16 async 123, reads: sync 2065 async 268, fsid
> ceabfa4d9bb85870)
>
> the filesystem used for the port builds:
>
> /tmp/tinderbox/7.4-amd64-u1 on
> /home/shared/freebsd/tinderbox/7.4-amd64-u1 (nullfs, local, fsid
> 03ff002929000000)
> /home/shared/freebsd/ports/head on
> /home/shared/freebsd/tinderbox/7.4-amd64-u1/a/ports (nullfs, local,
> read-only, fsid 04ff002929000000)
> /home/shared/freebsd/tinderbox/jails/7.4-amd64/src on
> /home/shared/freebsd/tinderbox/7.4-amd64-u1/usr/src (nullfs, local,
> read-only, fsid 05ff002929000000)
> devfs on /home/shared/freebsd/tinderbox/7.4-amd64-u1/dev (devfs,
> local, fsid 06ff007171000000)
> /home/shared/freebsd/distfiles on
> /home/shared/freebsd/tinderbox/7.4-amd64-u1/distcache (nullfs, local,
> fsid 07ff002929000000)
> linprocfs on /home/shared/freebsd/tinderbox/7.4-amd64-u1/compat/linux/proc
> (linprocfs, local, fsid 08ff00b5b5000000)
> procfs on /home/shared/freebsd/tinderbox/7.4-amd64-u1/proc (procfs,
> local, fsid 09ff000202000000)
>
>>> >
>>> > The issue almost definitely is the held reference on the vm object.
>>> > Lets remove Gleb' patches from the picture at all.
>>> >
>>> > After rethinking VV_TEXT handling both for nullfs and tmpfs (patched),
>>> > I see two issues ATM:
>>> >
>>> > 1. VV_TEXT may be set either on the lower vnode, or on the nullfs
>>> > vnode.
>>> > So if you executed a file from nullfs alias, lower vnode does not get
>>> > VV_TEXT set, and executable can still be opened for write.
>>> >
>>> > 2. For tmpfs, the hack I added to clear VV_TEXT if swap vm object
>>> > reference
>>> > count == 1, is not called often enough. This allows to VV_TEXT to
>>> > leak,
>>> > esp.
>>> > because nullfs after r240283 is not eager to reclaim its vnodes.
>>> >
>>> > I updated my branch with tmpfs patches with the following changes:
>>> >
>>> > 1. nullfs now bypasses the VV_TEXT set and clear operations to the
>>> > lower
>>> > vnode.
>>> >
>>> > 2. the tmpfs_clear_text() hack is removed, instead
>>> > vm_object_deallocate()
>>> > clears VV_TEXT on the tmpfs vnode if reference count goes to 1.
>>> >
>>> > Updated patch is at
>>> > http://people.freebsd.org/~kib/misc/tmpfs.13.patch
>>> > I tested it very lightly, so to say.
>>>
>>> I see the problem on a pristine r240611. Test scenario included.
>>>
>>> + mdconfig -a -t swap -s 1g -u 5
>>> + bsdlabel -w md5 auto
>>> + newfs -U md5a
>>> + mount /dev/md5a /mnt2
>>> + chmod 777 /mnt2
>>> + mount
>>> + grep /mnt
>>> + grep -q tmpfs
>>> + mount -t tmpfs tmpfs /mnt
>>> + chmod 777 /mnt
>>> + mkdir /mnt2/mp
>>> + mount -t nullfs /mnt /mnt2/mp
>>> + cp /usr/bin/true /mnt2/mp/true
>>> + /mnt/true
>>> +
>>> + rm -f /mnt/true
>>> + cp /usr/bin/true /mnt2/mp/true
>>> + /mnt2/mp/true
>>> +
>>> ./nullfs12.sh: cannot create /mnt2/mp/true: Text file busy
>>> + echo FAIL 2
>>> FAIL 2
>>> + mount
>>> + egrep 'tmpfs|nullfs|/mnt |/mnt2 '
>>> /dev/md5a on /mnt2 (ufs, local, soft-updates)
>>> tmpfs on /mnt (tmpfs, NFS exported, local)
>>> /mnt on /mnt2/mp (nullfs, local)
>>> + rm -f /mnt2/mp/true
>>
>> Yes, this is very close if not identical to the only test which I
>> performed
>> with the tmpfs.13.patch.
>>
>
> I can no longer reproduce the port build failures on r240651 amd64
> after applying your tmpfs.13.patch, and I haven't encountered any
> other obvious problems in the short time that I've been using it.  I
> did not rerun Peter Holm's nullfs12.sh test, since you had already
> subjected your patch to a similar test.
>

After further experiments, it appears that there are still some
problems with tmpfs.13.patch and the recent vfs/nullfs changes.  If I
use:

sysctl debug.iosize_max_clamp=0

I observe build failures with some ports, when various utilities fail
with EIO.  For example textproc/libxml2:

"...
/bin/sh ../../libtool --tag=CC   --mode=link cc  -O2 -pipe
-fno-strict-aliasing -std=gnu89 -pedantic -W -Wformat -Wunused
-Wimplicit -Wreturn-type -Wswitch -Wcomment -Wtrigraphs -Wformat
-Wchar-subscripts -Wuninitialized -Wparentheses -Wshadow
-Wpointer-arith -Wcast-align -Wwrite-strings -Waggregate-return
-Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline
-Wredundant-decls  -L/usr/local/lib -pthread -o tree1 tree1.o
../../libxml2.la  -lz -L/usr/local/lib -liconv -lm
eval: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
eval: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
../../libtool: printf: Input/output error
gmake[3]: Leaving directory
`/work/a/ports/textproc/libxml2/work/libxml2-2.7.8/doc/examples'
gmake[3]: Entering directory
`/work/a/ports/textproc/libxml2/work/libxml2-2.7.8/doc'
gmake[3]: Nothing to be done for `all-am'.
gmake[3]: Leaving directory
`/work/a/ports/textproc/libxml2/work/libxml2-2.7.8/doc'
gmake[2]: Leaving directory
`/work/a/ports/textproc/libxml2/work/libxml2-2.7.8/doc'
Making all in example
gmake[2]: Entering directory
`/work/a/ports/textproc/libxml2/work/libxml2-2.7.8/example'
cc -DHAVE_CONFIG_H -I. -I.. -I../include -I../include -I./include
-D_REENTRANT   -I/usr/local/include  -I/usr/local/include  -O2 -pipe
-fno-strict-aliasing -std=gnu89 -pedantic -W -Wformat -Wunused
-Wimplicit -Wreturn-type -Wswitch -Wcomment -Wtrigraphs -Wformat
-Wchar-subscripts -Wuninitialized -Wparentheses -Wshadow
-Wpointer-arith -Wcast-align -Wwrite-strings -Waggregate-return
-Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline
-Wredundant-decls -MT gjobread.o -MD -MP -MF .deps/gjobread.Tpo -c -o
gjobread.o gjobread.c
mv -f .deps/gjobread.Tpo .deps/gjobread.Po
/bin/sh ../libtool --tag=CC   --mode=link cc  -O2 -pipe
-fno-strict-aliasing -std=gnu89 -pedantic -W -Wformat -Wunused
-Wimplicit -Wreturn-type -Wswitch -Wcomment -Wtrigraphs -Wformat
-Wchar-subscripts -Wuninitialized -Wparentheses -Wshadow
-Wpointer-arith -Wcast-align -Wwrite-strings -Waggregate-return
-Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline
-Wredundant-decls  -L/usr/local/lib -pthread -o gjobread gjobread.o
../libxml2.la  -lz -L/usr/local/lib -liconv -lm
eval: printf: Input/output error
../libtool: printf: Input/output error
../libtool: printf: Input/output error
gmake[2]: *** [gjobread] Error 1
gmake[2]: Leaving directory
`/work/a/ports/textproc/libxml2/work/libxml2-2.7.8/example'
gmake[1]: *** [all-recursive] Error 1
gmake[1]: Leaving directory `/work/a/ports/textproc/libxml2/work/libxml2-2.7.8'
gmake: *** [all] Error 2
*** Error code 1

Stop in /a/ports/textproc/libxml2."

or x11/libxcb:

"...
===>   Registering installation for libxcb-1.7
/usr/bin/tr: Input/output error
/usr/bin/tr: Input/output error
/usr/bin/tr: Input/output error
================================================================
====================<phase 7: make package>====================
===>  Building package for libxcb-1.7
Deleting libxcb-1.7

=== Checking filesystem state
/buildscript: tr: Input/output error
/buildscript: tr: Input/output error
..."

There seems to be a temporal element to these failures, as they occur
often, but not always.  So far I have not been able to produce them
with the default debug.iosize_max_clamp=1, or with
debug.iosize_max_clamp=0  but without tmpfs.13.patch.  And they didn't
occur before the recent vfs/nullfs changes with timpfs.12.patch,
regardless of the debug.iosize_max_clamp setting.

Regards,
                 b.


More information about the freebsd-fs mailing list