svn commit: r300423 - in head/sys: fs/ext2fs ufs/ffs

Tue May 24 12:20:50 UTC 2016

On Sun, 22 May 2016, Kevin Lo wrote:

> Log:
>  arc4random() returns 0 to (2**32)âˆ’1, use an alternative to initialize
>  i_gen if it's zero rather than a divide by 2.
>
>  With inputs from  delphij, mckusick, rmacklem
>
>  Reviewed by:	mckusick
> ...
> Modified: head/sys/fs/ext2fs/ext2_alloc.c
> ==============================================================================
> --- head/sys/fs/ext2fs/ext2_alloc.c	Sun May 22 14:13:20 2016	(r300422)
> +++ head/sys/fs/ext2fs/ext2_alloc.c	Sun May 22 14:31:20 2016	(r300423)
> @@ -408,7 +408,8 @@ ext2_valloc(struct vnode *pvp, int mode,
> 	/*
> 	 * Set up a new generation number for this inode.
> 	 */
> -	ip->i_gen = arc4random();
> +	while (ip->i_gen == 0 || ++ip->i_gen == 0)
> +		ip->i_gen = arc4random();

This is a correct implementation of ffs's intended method, but ffs's
intended method is wrong (see below for its wrongness).

Correctness depends on i_gen having type uint32_t in ext2fs.  This
makes the code +ip_i_gen undead, so i_gen is re-randomized occasionally
(averaged over all inodes, once for every ~2 billionth reuse of an
inode, which is practically never.  Bugs in ffs prevent it being done
even that often there.  So the re-randomization is almost useless.  I
think it is slighty worse than useless, since it may give the same
i_gen immediately, while always incrementing (but skipping 0) always
gives a new i_gen.

ext2fs might not need this at all.  For ffs, then special case of
i_gen == 0 must be handled because we still pretend to support file
systems created by newfs versions almost 25 years old.  newfs didn't
initialize di_gen back then, so all inodes started with di_gen == 0.
ext2fs is less that 25 years old, so it has less history to support.

>
> 	vfs_timestamp(&ts);
> 	ip->i_birthtime = ts.tv_sec;
>
> Modified: head/sys/fs/ext2fs/ext2_vfsops.c
> ==============================================================================
> --- head/sys/fs/ext2fs/ext2_vfsops.c	Sun May 22 14:13:20 2016	(r300422)
> +++ head/sys/fs/ext2fs/ext2_vfsops.c	Sun May 22 14:31:20 2016	(r300423)
> @@ -998,7 +998,8 @@ ext2_vget(struct mount *mp, ino_t ino, i
> 	 * already have one. This should only happen on old filesystems.
> 	 */
> 	if (ip->i_gen == 0) {
> -		ip->i_gen = random() + 1;
> +		while (ip->i_gen == 0)
> +			ip->i_gen = arc4random();

This is correct, but might be unnecessary (see above).  "on old filesystems"
was copied from ffs where it meant "on file systems created by versions
of newfs that didn't initialize di_gen".  Such file systems are restricted
to ffs1.  But i_gen stil occurs due to bugs in newfs -- it is missing this
loop, so it sometimes sets i_gen to the random value of 1, and we can't/don't
tell the difference between this and unitialized.

I think ext2fs is more like ffs2 in a relevant way here.  Both have the
feature of speeding up newfs by only writing a few inodes.  Thus most
inode allocations occur in the kernel, and it hardly matters if newfs
initialized i_gen for the few inodes that it initialized.  extfs and
ext2fs also have an inode non-clearing feature on unlink that might
be relevant.  I forget if ffs2 has this.  So the code should be simplified
by never expecting newfs to initialize i_gen to nonzero.  It already
almost does this, except in the comment.  For this, it is necessary to
skip i_gen == 0 (mod 2**32).

> 		if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0)
> 			ip->i_flag |= IN_MODIFIED;
> 	}
>
> Modified: head/sys/ufs/ffs/ffs_alloc.c
> ==============================================================================
> --- head/sys/ufs/ffs/ffs_alloc.c	Sun May 22 14:13:20 2016	(r300422)
> +++ head/sys/ufs/ffs/ffs_alloc.c	Sun May 22 14:31:20 2016	(r300423)
> @@ -1102,8 +1102,8 @@ dup_alloc:
> 	/*
> 	 * Set up a new generation number for this inode.
> 	 */
> -	if (ip->i_gen == 0 || ++ip->i_gen == 0)
> -		ip->i_gen = arc4random() / 2 + 1;
> +	while (ip->i_gen == 0 || ++ip->i_gen == 0)
> +		ip->i_gen = arc4random();

This is broken due to an old type error.  In ffs, i_gen has type uint64_t,
so it is physically impossible for ++ip->i_gen to wrap to 0.
     (i_gen is initialized from di_gen, and that has type uint32_t, so
     i_gen is initially <= UINT32_MAX and it would take 584 years to wrap
     with the modest inode recycling period of 1 nanosecond.)
So the above is an obfuscated way of writing:

 	if (ip->i_gen == 0) {
 		/*
 		 * This value means uninitialized (or a bug).  Init now.
 		 * The loop is to not have the usual bug here.
 		 */
 		do
 			ip->i_gen = arc4random();
 		while (ip->i_gen == 0);
 	} else
 		ip->i_gen++;

Now it is clear that i_gen can grow far above UINT32_MAX.  But usually
it doesn't.  Growth above UINT32_MAX gets truncated when the vnode is
recycled.  Overflow occurs with i_gen is stored to di_gen, and growth
resumes at a small truncated value.

The type error gives truncation on most uses of i_gen: di_gen, ufid_gen
and ueh_i_gen are all 32 bits.  va_gen is 32-bits on 32-bit arches but
is 64 bits on 64-bit arches.  Various bugs result.  The bugs are mostly
features.  It is not very useful to re-randomize on reaching the 32-bit
boundary.  The bugfeature normally avoids this.  If i_gen were not
truncated to 32 bits when the vnode is recycled (or on unmount), and
if it were consistently truncated (that means, truncate it to 32 bits
in va_gen on 64-bit arches), then the bugfeature would work perfectly.
The top 32 bits in i_gen would then be unused except to record history
for a few trillion years for most inodes.

> 	DIP_SET(ip, i_gen, ip->i_gen);
> 	if (fs->fs_magic == FS_UFS2_MAGIC) {
> 		vfs_timestamp(&ts);
> @@ -2080,7 +2080,8 @@ gotit:
> 		bzero(ibp->b_data, (int)fs->fs_bsize);
> 		dp2 = (struct ufs2_dinode *)(ibp->b_data);
> 		for (i = 0; i < INOPB(fs); i++) {
> -			dp2->di_gen = arc4random() / 2 + 1;
> +			while (dp2->di_gen == 0)
> +				dp2->di_gen = arc4random();

This seems to be correct.  It is only for the ffs2 case, and di_gen was
initialized to 0 by the bzero(), but the while loop is easier to read
that the more optimal do-while loop that I wrote above.

> 			dp2++;
> 		}
> 		/*
>
> Modified: head/sys/ufs/ffs/ffs_vfsops.c
> ==============================================================================
> --- head/sys/ufs/ffs/ffs_vfsops.c	Sun May 22 14:13:20 2016	(r300422)
> +++ head/sys/ufs/ffs/ffs_vfsops.c	Sun May 22 14:31:20 2016	(r300423)
> @@ -1768,7 +1768,8 @@ ffs_vgetf(mp, ino, flags, vpp, ffs_flags
> 	 * already have one. This should only happen on old filesystems.
> 	 */
> 	if (ip->i_gen == 0) {
> -		ip->i_gen = arc4random() / 2 + 1;
> +		while (ip->i_gen == 0)
> +			ip->i_gen = arc4random();

This also seems to be correct.  Now the compiler can easily optimize the
while loop to a do-while loop, since a previous check for i_gen == 0 is
visible.

"should" in the comment is correct, except this should never happen
now since file systems that were old when it was written now shouldn't
exist.  However, this does happen now, mostly for new file systems,
due to a bug in newfs: newfs is missing this while loop, so it sometimes
initializes di_gen to 0.  Then we can't/don't tell if di_gen was
initialized to a random value, so we-re-randomize it.

> 		if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
> 			ip->i_flag |= IN_MODIFIED;
> 			DIP_SET(ip, i_gen, ip->i_gen);

All versions of newfs seem to have had buggy initialization:
- they didn't initialize di_gen (except to 0) until 1997
- the first 1997 version used /dev/urandom to initialize "long ret;".
   Assignment of this to "int32_t di_gen;"  overflowed on 64-bit arches
   (alpha?).  This gave a negative value which gave further overflows
   or suprising sign extensions mainly when assigned to "u_long va_gen'";
   most other types were consistently signed.
- the next 1997 version used random().  This fixed the generation of
   negative values.  0 was still generated
- the 2003 version used arc4random().  This gave negative values again.
   0 was still generated.  This is essentially the current version.  The
   current version uses a wrapper function like the first 1997 version.
- newfs doesn't seem to have ever had the version that did *random() / 2 + 1
   like the kernel did.  It thus escaped having the overflow/sign extension
   bugs that the kernel had.  Dividing by 2 was apparently supposed to avoid
   these bugs.  It worked with random() since random() returns at most
   INT32_MAX.  But arc4random() returns at mist UINT32_MAX.  Dividing this
   by 2 gives (u_int)INT32_MAX and adding 1 gives a value that overflows
   when assigned to int32_t.  This was later fixed by changing lots of
   int32_t to uint32_t.

I'm not sure about the security aspects of randomizing i_gen.  The re-
randomization is so accidental and infrequent that security must be
unimportant.  But I think a unique generation (over all inodes on all
file systems) would be better than any randomness.  FreeBSD-1 was closer
to having that -- i_gen was initialized to the global nextgennumber++ if
it was zero; it was not incremented for inode use.  The globabl would
have to be 64 bits now.  Ensuring uniqueness is not easy since it means
that you have to check inode numbers on not-very-trusted newly mounted
file systems against all inode numbers already in use.  Perhaps it works
to always use a new set for every new mount (ignore the ones on disk).

Bruce