kern/122047: [ext2fs] incorrect handling of UF_IMMUTABLE / UF_APPEND, flag on EXT2FS (maybe others)

Bruce Evans brde at optusnet.com.au
Mon Jun 2 14:56:04 UTC 2008


On Mon, 2 Jun 2008, Ighighi wrote:

> On Linux, only the root user may set/clear the immutable/append flags
> on ext2 filesystems... Shouldn't FreeBSD do this too, as a POLA?

I think it should do just that...

> Anyway the attached patch extends the previous one by making it possible
> to follow the current Linux convention by setting the sysctl to 0.
> Setting it to 1, allows normal users to set them as well, and setting it
> to -1 preserves current (though erroneous) FreeBSD behavior.

... but nothing more.  Why have complications provide more control over
Linux file systems than Linux does?  The current behaviour seems to be
just a bug from not understanding that Linux has no user immutable/append
flags.

% --- src/sys/gnu/fs/ext2fs/ext2_inode_cnv.c.orig	2005-06-14 22:36:10.000000000 -0400
% +++ src/sys/gnu/fs/ext2fs/ext2_inode_cnv.c	2008-06-02 00:35:34.658524358 -0430
% @@ -30,11 +30,19 @@
%  #include <sys/lock.h>
%  #include <sys/stat.h>
%  #include <sys/vnode.h>
% +#include <sys/kernel.h>
% +#include <sys/sysctl.h>
% 
%  #include <gnu/fs/ext2fs/inode.h>
%  #include <gnu/fs/ext2fs/ext2_fs.h>
%  #include <gnu/fs/ext2fs/ext2_extern.h>
% 
% +SYSCTL_DECL(_vfs_e2fs);
% +
% +static int userflags = -1;
% +SYSCTL_INT(_vfs_e2fs, OID_AUTO, userflags, CTLFLAG_RW,
% +    &userflags, 0, "Users may set/clear filesystem flags");
% +
%  void
%  ext2_print_inode( in )
%  	struct inode *in;

The existence of vfs sysctls is another bug.  They should be mount options,
or perhaps system-wide, or not exist at all.  ext2fs has only one vfs sysctl
now:
- vfs.e2fs.dircheck.  This sysctl is less broken than in ffs, where the
   corresponding sysctl is spelled debug.dircheck and a comment still
   says that the corresponding static kernel variable `dirchk' is changed
   by "patching".  The kernel variable is spelled differently to the
   sysctl to confuse me when grepping for dircheck.
- the debug.doasyncfree sysctl is in dead code (under the non-option
   FANCY_REALLOC.  Block reallocation is also dead in ext2fs).  This
   sysctl is less broken in ffs.  There it is named vfs.ffs.doasyncfree.
OTOH, perhaps these sysctls really did belong under debug or vfs.debug.
It is not very useful to restrict them to just ffs or ext2fs when 
have many mounted file systems.

This bug should not be extended.

% @@ -83,8 +91,37 @@ ext2_ei2i(ei, ip)
%  	ip->i_mtime = ei->i_mtime;
%  	ip->i_ctime = ei->i_ctime;
%  	ip->i_flags = 0;
% -	ip->i_flags |= (ei->i_flags & EXT2_APPEND_FL) ? APPEND : 0;
% -	ip->i_flags |= (ei->i_flags & EXT2_IMMUTABLE_FL) ? IMMUTABLE : 0;

I think it would work to just map EXT2*_FL to SF_*.

% +	switch (userflags) {
% +	case 0:
% +		/*
% +		 * Only the superuser may set/clear these flags.
% +		 * This is the current behavior on Linux.
% +		 */
% +		if (ei->i_flags & EXT2_APPEND_FL)
% +			ip->i_flags |= SF_APPEND;
% +		if (ei->i_flags & EXT2_IMMUTABLE_FL)
% +			ip->i_flags |= SF_IMMUTABLE;
% +		break;
% +	case 1:
% +		/*
% +		 * Users may set/clear these flags on files they own.
% +		 */
% +		if (ei->i_flags & EXT2_APPEND_FL)
% +			ip->i_flags |= UF_APPEND;
% +		if (ei->i_flags & EXT2_IMMUTABLE_FL)
% +			ip->i_flags |= UF_IMMUTABLE;
% +		break;

For administration it can be convenient for the file system to behave
a little differently to native mode, but I letting root change the
(system) immutable flags is enough.

% +	case -1:
% +	default:
% +		/*
% +		 * Default behavior on FreeBSD
% +		 */
% +		if (ei->i_flags & EXT2_APPEND_FL)
% +			ip->i_flags |= APPEND;
% +		if (ei->i_flags & EXT2_IMMUTABLE_FL)
% +			ip->i_flags |= IMMUTABLE;
% +		break;
% +	}

I think the current behaviour is too buggy to keep.  (Your original PR
describes the bugs -- FreeBSD makes a mess by setting 2 flags in the
in-core inode and allowing these flags to be changed independently, then
cannot merge the flags properly when writing to the on-disk inode.)

[conversion to the on-disk inode]

Similarly.

There is a problem in ext2_vnops.c that is not touched by these patches.
Even in the simple version that only support SF_*, ext2_setattr() needs
changes to disallow setting of UF_* -- otherwise FreeBSD still makes a
mess, with stat() showing changes to UF_* succeeding while the inode is
in-core but going away when the inode is flushed from the inode/vnode
cache.  The fix is simply to remove the code that supports UF_*.  (We
could also globably replace IMMUTABLE and APPEND by SF_IMMUTABLE and
SF_APPEND.  This would be clearer but would increase divergence from
ffs.)  The fix to support all 3 sysctl modes is not so simple:
- case 0: dynamically disallow attempts to set UF_*.
- case 1: dynamically disallow attempts to set SF_*.
- case -1: dynamically convert attempts to set SF_* or UF*_ into attempts
   to set both, and somehow relax the checks for setting SF_* so that this
   has a chance of succeeding for non-root.
I don't want these complications.

Note that the corresponding code in ffs is poorly organized and buggy
(it doesn't allow preservation of flags in the right way).  ext2_setattr()
was once almost identical to ufs_settatr() but now has the following
bitrot:
- missing support for setting atimes on exec.
- different comments about privilege though the code is the same.
- different comments about truncate() on r/o file systems.  Missing a
   critical fix for truncate().
- missing the comment expansion and cleanup for utimes().  I think there was
   a minor security-related fix in there too, but this is now null.

% --- src/sys/gnu/fs/ext2fs/ext2_vnops.c.orig	2006-02-19 20:53:14.000000000 -0400
% +++ src/sys/gnu/fs/ext2fs/ext2_vnops.c	2008-05-28 07:58:02.189157441 -0430
% @@ -358,6 +358,8 @@ ext2_getattr(ap)
%  	vap->va_mtime.tv_nsec = ip->i_mtimensec;
%  	vap->va_ctime.tv_sec = ip->i_ctime;
%  	vap->va_ctime.tv_nsec = ip->i_ctimensec;
% +	vap->va_birthtime.tv_sec = 0;
% +	vap->va_birthtime.tv_nsec = 0;
%  	vap->va_flags = ip->i_flags;
%  	vap->va_gen = ip->i_gen;
%  	vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize;

This is unrelated and should be handled centrally.  Almost all file
systems get this wrong.  Most fail to set va_birthtime, so stat()
returns kernel stack garbage for st_birthtime.  ffs1 does the same
as the above.  msdosfs does the above correctly, by setting tv_sec to
(time_t)-1 in unsupported cases.

Bruce


More information about the freebsd-fs mailing list