misc/70096: Full msdos file system causes corruption

Tue Aug 17 17:40:28 PDT 2004

The following reply was made to PR misc/70096; it has been noted by GNATS.

From: Bruce Evans <bde at zeta.org.au>
To: Amit Khivesara <amit.khivesara at utstar.com>
Cc: freebsd-gnats-submit at FreeBSD.org, freebsd-bugs at FreeBSD.org
Subject: Re: misc/70096: Full msdos file system causes corruption
Date: Wed, 18 Aug 2004 10:32:42 +1000 (EST)

 On Fri, 6 Aug 2004, Amit Khivesara wrote:

 > >Description:
 >  When the file system gets full, then it stores a incorrect
 >  value of nxtfree into the disk fsinfo.
 >
 > This will cause the assert in :
 > /*
 >      * Check and validate (or perhaps invalidate?) the fsinfo structure?
 >      */
 >     if (pmp->pm_fsinfo && pmp->pm_nxtfree > pmp->pm_maxcluster) {
 >         printf(
 >         "Next free cluster in FSInfo (%lu) exceeds maxcluster (%lu)\n",
 >             pmp->pm_nxtfree, pmp->pm_maxcluster);
 >         error = EINVAL;
 >         goto error_exit;
 >     }
 >
 >
 > to be fired.

 I think failure of this check shouldn't cause a mount failure, since
 pmp->pm_nxtfree is only advisory and quite often doesn't give a cluster
 that is actually free.  IIRC, Windows (95,XP) doesn't care what it is
 and doesn't fix it in scandisk or chkdsk after FreeBSD has messed it up.

 The check seems to have been fixed incorrectly in revs.1.92 and 1.117
 of msdosfs_vfsops.c.  Apart from the bug reported in this PR, there
 only seems to be a problem with the magic value of 0xffffffff which
 is sometimes stored in the fsinfo corresponding to pm_nxtfree.  Old
 versions didn't do anything here.  NextBSD (at least in rev.1.14 of
 msdosfs_vfsops.c) just silently converts the (wrong on 64-bit machines?)
 value of (u_long)-1 to the slightly wrong value of 0 (cluster can never
 be free).  Rev.1.92 in the FreeBSD version added the above error
 handling.  It catches the magic value as a special case of a too-large
 value.  That was wrong, and the problem was worked around in rev.1.117
 by converting 0xffffffff to the slightly less wrong value of CLUST_FIRST
 (CLUST_FIRST can be free but rarely is).

 > >Fix:
 > --- msdosfs_fat.c.orig  Wed Jun 16 11:16:37 2004
 > +++ msdosfs_fat.c       Wed Jun 16 11:15:41 2004
 > @@ -434,8 +434,9 @@
 >                         for (cn = 0; cn < pmp->pm_maxcluster; cn += N_INUSEBITS)
                                        ^^^^^^^^^^^^^^^^^^^^^^^
 >                                 if (pmp->pm_inusemap[cn / N_INUSEBITS] != (u_int)-1)
 >                                         break;
 > -                       pmp->pm_nxtfree = cn
 > -                + ffs(pmp->pm_inusemap[cn / N_INUSEBITS]^(u_int)-1) - 1;
 > +                       pmp->pm_nxtfree = (cn < pmp->pm_maxcluster)?
                                              ^^^^^^^^^^^^^^^^^^^^^^^

 Er, this condition is always satisfied, so the patch has no effect.

 > +                       (cn + ffs(pmp->pm_inusemap[cn / N_INUSEBITS]^(u_int)-1) - 1)
 > +                       :0;
 >                 }
 >                 if (bread(pmp->pm_devvp, pmp->pm_fsinfo, fsi_size(pmp),
 >                     NOCRED, &bpn) != 0) {
 >

 cn+0 and even cn+1 are within bounds, but cn+ffs(...) can apparently
 be larger.  There should be no problem, since pm_inusemap[] is supposed
 to have 1 bits corresponding to clusters after the last one (if the
 last one is before the end of the bitmap).

 I can't see any bug here.  I can see a related one that should be harmless
 due to other bugs: RELENG_4 (and thus FreeBSD-4.9) has a serious off-by-1
 error in the allocation of the bitmap.  From msdosfsmountfs() in RELENG_4:

 % 	pmp->pm_inusemap = malloc(((pmp->pm_maxcluster + N_INUSEBITS - 1)
 % 				   / N_INUSEBITS)
 % 				  * sizeof(*pmp->pm_inusemap),
 % 				  M_MSDOSFSFAT, M_WAITOK);

 pm_maxcluster actually is the maximum cluster number, so 1 must be added
 to it to give the size (in bits) of a minimal bitmap.  Not doing this
 gives a too-small allocation (off by 1 32-bit word) iff pm->pm_maxcluster
 is a multiple of 32.  Allocation sizes are rounded up to an allocation
 or page boundary, so the bug is only serious of pm_maxcluster is a
 multiple of PAGE_SIZE * NBBY or something like that.  It rarely happens.
 When it happens, the word after the end of the malloced region is
 clobbered, starting with initialization of the bitmap in fillinusemap():

 % % 	/*
 % 	 * Mark all clusters in use, we mark the free ones in the fat scan
 % 	 * loop further down.
 % 	 */
 % 	for (cn = 0; cn < (pmp->pm_maxcluster + N_INUSEBITS) / N_INUSEBITS; cn++)
 % 		pmp->pm_inusemap[cn] = (u_int)-1;

 This is missing the off-by-1 error, so it initializes the last word in the
 bitmap correctly, but this word may be beyond the end of the allocated
 region so writes to it may clobber unrelated variables or be undone by
 writes to unrelated variables.

 The allocation bug is fixed in rev.1.118 of msdsofs_vfsops.c in -current,
 but I forgot to merge the fix.

 Nearby (old) bugs:

 % 	/*
 % 	 * If we have an FSInfo block, update it.
 % 	 */
 % 	if (pmp->pm_fsinfo) {
 % 		u_long cn = pmp->pm_nxtfree;
 %
 % 		if (pmp->pm_freeclustercount
   		    ^^^^^^^^^^^^^^^^^^^^^^^^ (2)
 % 		    && (pmp->pm_inusemap[cn / N_INUSEBITS]
 % 			& (1 << (cn % N_INUSEBITS)))) {
 % 			/*
 % 			 * The cluster indicated in FSInfo isn't free
 % 			 * any longer.  Got get a new free one.
 % 			 */
 % 			for (cn = 0; cn < pmp->pm_maxcluster; cn += N_INUSEBITS)
   			             ^^^^^^^^^^^^^^^^^^^^^^^ (1)
 % 				if (pmp->pm_inusemap[cn / N_INUSEBITS] != (u_int)-1)
 % 					break;
 % 			pmp->pm_nxtfree = cn
 % 				+ ffs(pmp->pm_inusemap[cn / N_INUSEBITS]
 % 				      ^ (u_int)-1) - 1;
 % 		}

 (1) Off-by-1 error.  pm_maxcluster actually is the maximum cluster number,
     so the loop terminates 1 too early and if pm_maxcluster is a multiple
     of N_INUSEBITS.  This is actually a feature -- it compensates for
     the off-by-1 error for allocation, so garbage in the unallocated last
     word cannot be the direct cause of the probem in the PR.
 (2) Bogus checking of pmp->pm_freeclustercount.  That variable being
     zero doesn't mean that pmp->pm_nxtfree is free; it just means that
     the loop is not worth doing because it would not find a free cluster.
     If we actually cared about the exported version of pmp->pm_nxtfree
     being free, then we should set pmp->pm_nxtfree to the magic value of
     0xffffffff or the nominal value of CLUST_FIRST if
     pmp->pm_freeclustercount is 0.
 (3) General bogusness.  The exported version of pmp->pm_nxtfree is supposed
     to give a hint about a good place to start searching for a free
     cluster.  The loop always starts searching at the bad place 0.
     This is inefficent (FATs and their bitmaps can be very large, and
     by starting at cluster 0 we increase the chance of having to search
     through a large number of allocated clusters).  If we want
     inefficiency (but not so much), then we can just export the magic
     or nominal value.  That way, the the FAT only needs to be searched
     starting from near cluster 0 once when a new file system instance
     starts up (or performs an allocation) instead of on potentially many
     FAT updates.  We actually implement much larger inefficiencies by
     using random allocation for the first cluster of each file.  This
     mainly ensures that average seeks are large, but as a side effect
     it makes the hint provided by pmp->pm_nxtfree useless.  We just
     ignore the hint except for updating it when we update the FAT, but
     this results in pmp->pm_nxtfree being out of date when the FAT is
     updated, so it is very likely to be unfree when it should be
     very likely to be free, so the the slow search is very likely to
     be needed.

 Summary: there is a lot of brokenness here; a good quick fix for RELENG_4
 would be to remove all traces of pm_nxtfree and except to write a harmless
 constant value (0xffffffff?) when exporting it.

 Bruce