svn commit: r300423 - in head/sys: fs/ext2fs ufs/ffs

Tue May 24 14:47:28 UTC 2016

On 05/24/16 07:20, Bruce Evans wrote:
> On Sun, 22 May 2016, Kevin Lo wrote:
>
>> Log:
>>  arc4random() returns 0 to (2**32)−1, use an alternative to initialize
>>  i_gen if it's zero rather than a divide by 2.
>>
>>  With inputs from  delphij, mckusick, rmacklem
>>
>>  Reviewed by:    mckusick
>> ...
>> Modified: head/sys/fs/ext2fs/ext2_alloc.c
>> ==============================================================================
>>
>> --- head/sys/fs/ext2fs/ext2_alloc.c    Sun May 22 14:13:20 2016
>> (r300422)
>> +++ head/sys/fs/ext2fs/ext2_alloc.c    Sun May 22 14:31:20 2016
>> (r300423)
>> @@ -408,7 +408,8 @@ ext2_valloc(struct vnode *pvp, int mode,
>>     /*
>>      * Set up a new generation number for this inode.
>>      */
>> -    ip->i_gen = arc4random();
>> +    while (ip->i_gen == 0 || ++ip->i_gen == 0)
>> +        ip->i_gen = arc4random();
>
> This is a correct implementation of ffs's intended method, but ffs's
> intended method is wrong (see below for its wrongness).
>
> Correctness depends on i_gen having type uint32_t in ext2fs.  This
> makes the code +ip_i_gen undead, so i_gen is re-randomized occasionally
> (averaged over all inodes, once for every ~2 billionth reuse of an
> inode, which is practically never.  Bugs in ffs prevent it being done
> even that often there.  So the re-randomization is almost useless.  I
> think it is slighty worse than useless, since it may give the same
> i_gen immediately, while always incrementing (but skipping 0) always
> gives a new i_gen.
>
> ext2fs might not need this at all.  For ffs, then special case of
> i_gen == 0 must be handled because we still pretend to support file
> systems created by newfs versions almost 25 years old.  newfs didn't
> initialize di_gen back then, so all inodes started with di_gen == 0.
> ext2fs is less that 25 years old, so it has less history to support.
>

I agree ext2fs, and likely UFS2, can handle the case of di_gen == 0.

The additional question is where do we actually need to do randomization 
in ext2fs? UFS does the randomization when formatting
so ffs_vget doesn't do randomization at all. arc4random is AFAICT,
only required for very old UFS1.

I lost interest in ext2fs and can't be bothered to look in linux ;).

>>
>>     vfs_timestamp(&ts);
>>     ip->i_birthtime = ts.tv_sec;
>>
>> Modified: head/sys/fs/ext2fs/ext2_vfsops.c
>> ==============================================================================
>>
>> --- head/sys/fs/ext2fs/ext2_vfsops.c    Sun May 22 14:13:20 2016
>> (r300422)
>> +++ head/sys/fs/ext2fs/ext2_vfsops.c    Sun May 22 14:31:20 2016
>> (r300423)
>> @@ -998,7 +998,8 @@ ext2_vget(struct mount *mp, ino_t ino, i
>>      * already have one. This should only happen on old filesystems.
>>      */
>>     if (ip->i_gen == 0) {
>> -        ip->i_gen = random() + 1;
>> +        while (ip->i_gen == 0)
>> +            ip->i_gen = arc4random();
>
> This is correct, but might be unnecessary (see above).  "on old
> filesystems"
> was copied from ffs where it meant "on file systems created by versions
> of newfs that didn't initialize di_gen".  Such file systems are restricted
> to ffs1.  But i_gen stil occurs due to bugs in newfs -- it is missing this
> loop, so it sometimes sets i_gen to the random value of 1, and we
> can't/don't
> tell the difference between this and unitialized.
>
> I think ext2fs is more like ffs2 in a relevant way here.  Both have the
> feature of speeding up newfs by only writing a few inodes.  Thus most
> inode allocations occur in the kernel, and it hardly matters if newfs
> initialized i_gen for the few inodes that it initialized.  extfs and
> ext2fs also have an inode non-clearing feature on unlink that might
> be relevant.  I forget if ffs2 has this.  So the code should be simplified
> by never expecting newfs to initialize i_gen to nonzero.  It already
> almost does this, except in the comment.  For this, it is necessary to
> skip i_gen == 0 (mod 2**32).
>
>>         if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0)
>>             ip->i_flag |= IN_MODIFIED;
>>     }
>>
>> Modified: head/sys/ufs/ffs/ffs_alloc.c
>> ==============================================================================
>>
>> --- head/sys/ufs/ffs/ffs_alloc.c    Sun May 22 14:13:20 2016    (r300422)
>> +++ head/sys/ufs/ffs/ffs_alloc.c    Sun May 22 14:31:20 2016    (r300423)
>> @@ -1102,8 +1102,8 @@ dup_alloc:
>>     /*
>>      * Set up a new generation number for this inode.
>>      */
>> -    if (ip->i_gen == 0 || ++ip->i_gen == 0)
>> -        ip->i_gen = arc4random() / 2 + 1;
>> +    while (ip->i_gen == 0 || ++ip->i_gen == 0)
>> +        ip->i_gen = arc4random();
>
> This is broken due to an old type error.  In ffs, i_gen has type uint64_t,
> so it is physically impossible for ++ip->i_gen to wrap to 0.
>     (i_gen is initialized from di_gen, and that has type uint32_t, so
>     i_gen is initially <= UINT32_MAX and it would take 584 years to wrap
>     with the modest inode recycling period of 1 nanosecond.)
> So the above is an obfuscated way of writing:
>
>     if (ip->i_gen == 0) {
>         /*
>          * This value means uninitialized (or a bug).  Init now.
>          * The loop is to not have the usual bug here.
>          */
>         do
>             ip->i_gen = arc4random();
>         while (ip->i_gen == 0);
>     } else
>         ip->i_gen++;
>
> Now it is clear that i_gen can grow far above UINT32_MAX.  But usually
> it doesn't.  Growth above UINT32_MAX gets truncated when the vnode is
> recycled.  Overflow occurs with i_gen is stored to di_gen, and growth
> resumes at a small truncated value.
>
> The type error gives truncation on most uses of i_gen: di_gen, ufid_gen
> and ueh_i_gen are all 32 bits.  va_gen is 32-bits on 32-bit arches but
> is 64 bits on 64-bit arches.  Various bugs result.  The bugs are mostly
> features.  It is not very useful to re-randomize on reaching the 32-bit
> boundary.  The bugfeature normally avoids this.  If i_gen were not
> truncated to 32 bits when the vnode is recycled (or on unmount), and
> if it were consistently truncated (that means, truncate it to 32 bits
> in va_gen on 64-bit arches), then the bugfeature would work perfectly.
> The top 32 bits in i_gen would then be unused except to record history
> for a few trillion years for most inodes.
>
>>     DIP_SET(ip, i_gen, ip->i_gen);
>>     if (fs->fs_magic == FS_UFS2_MAGIC) {
>>         vfs_timestamp(&ts);
>> @@ -2080,7 +2080,8 @@ gotit:
>>         bzero(ibp->b_data, (int)fs->fs_bsize);
>>         dp2 = (struct ufs2_dinode *)(ibp->b_data);
>>         for (i = 0; i < INOPB(fs); i++) {
>> -            dp2->di_gen = arc4random() / 2 + 1;
>> +            while (dp2->di_gen == 0)
>> +                dp2->di_gen = arc4random();
>
> This seems to be correct.  It is only for the ffs2 case, and di_gen was
> initialized to 0 by the bzero(), but the while loop is easier to read
> that the more optimal do-while loop that I wrote above.
>
>>             dp2++;
>>         }
>>         /*
>>
>> Modified: head/sys/ufs/ffs/ffs_vfsops.c
>> ==============================================================================
>>
>> --- head/sys/ufs/ffs/ffs_vfsops.c    Sun May 22 14:13:20 2016
>> (r300422)
>> +++ head/sys/ufs/ffs/ffs_vfsops.c    Sun May 22 14:31:20 2016
>> (r300423)
>> @@ -1768,7 +1768,8 @@ ffs_vgetf(mp, ino, flags, vpp, ffs_flags
>>      * already have one. This should only happen on old filesystems.
>>      */
>>     if (ip->i_gen == 0) {
>> -        ip->i_gen = arc4random() / 2 + 1;
>> +        while (ip->i_gen == 0)
>> +            ip->i_gen = arc4random();
>
> This also seems to be correct.  Now the compiler can easily optimize the
> while loop to a do-while loop, since a previous check for i_gen == 0 is
> visible.
>
> "should" in the comment is correct, except this should never happen
> now since file systems that were old when it was written now shouldn't
> exist.  However, this does happen now, mostly for new file systems,
> due to a bug in newfs: newfs is missing this while loop, so it sometimes
> initializes di_gen to 0.  Then we can't/don't tell if di_gen was
> initialized to a random value, so we-re-randomize it.
>
>>         if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
>>             ip->i_flag |= IN_MODIFIED;
>>             DIP_SET(ip, i_gen, ip->i_gen);
>
> All versions of newfs seem to have had buggy initialization:
> - they didn't initialize di_gen (except to 0) until 1997
> - the first 1997 version used /dev/urandom to initialize "long ret;".
>   Assignment of this to "int32_t di_gen;"  overflowed on 64-bit arches
>   (alpha?).  This gave a negative value which gave further overflows
>   or suprising sign extensions mainly when assigned to "u_long va_gen'";
>   most other types were consistently signed.
> - the next 1997 version used random().  This fixed the generation of
>   negative values.  0 was still generated
> - the 2003 version used arc4random().  This gave negative values again.
>   0 was still generated.  This is essentially the current version.  The
>   current version uses a wrapper function like the first 1997 version.
> - newfs doesn't seem to have ever had the version that did *random() / 2
> + 1
>   like the kernel did.  It thus escaped having the overflow/sign extension
>   bugs that the kernel had.  Dividing by 2 was apparently supposed to avoid
>   these bugs.  It worked with random() since random() returns at most
>   INT32_MAX.  But arc4random() returns at mist UINT32_MAX.  Dividing this
>   by 2 gives (u_int)INT32_MAX and adding 1 gives a value that overflows
>   when assigned to int32_t.  This was later fixed by changing lots of
>   int32_t to uint32_t.
>
> I'm not sure about the security aspects of randomizing i_gen.  The re-
> randomization is so accidental and infrequent that security must be
> unimportant.  But I think a unique generation (over all inodes on all
> file systems) would be better than any randomness.  FreeBSD-1 was closer
> to having that -- i_gen was initialized to the global nextgennumber++ if
> it was zero; it was not incremented for inode use.  The globabl would
> have to be 64 bits now.  Ensuring uniqueness is not easy since it means
> that you have to check inode numbers on not-very-trusted newly mounted
> file systems against all inode numbers already in use.  Perhaps it works
> to always use a new set for every new mount (ignore the ones on disk).
>
> Bruce

Yes, I have always wondered: who is detecting if we have a repeated
generation number, the end user? And then if we get a repeated
generation number and try to fix it, how do we know that a new
random value we generate to replace the duplicate is unused?

I am rather afraid to hear the answers :-/.

Pedro.