kern/105964: Make MSDOSFS_LARGE a mount option

Tue Nov 28 09:41:02 PST 2006

>Number:         105964
>Category:       kern
>Synopsis:       Make MSDOSFS_LARGE a mount option
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 28 17:40:17 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     Oliver Fromme
>Release:        n/a
>Organization:
secnetix GmbH & Co. KG
		http://www.secnetix.de/bsd
>Environment:
   n/a
>Description:

   This problem has been discussed on the -stable mailing list,
   an Craig Rodrigues <rodrigc at crodrigues.org> asked me to
   submit a PR for this issue because he's interested to pick
   it up.  So here we go.

   The FAT file system format doesn't support file ID numbers
   (UFS/FFS calls them "inode numbers").  Therefore MSDOSFS has
   to create such numbers somehow.  Currently there are two
   hacks for that purpose, with different drawbacks:

   -1-  (Default)  Use the directory entry offset of the file
        as the file ID number.  Assume that the hole media is
        divided into blocks the size of a directory entry
        (32 bytes), and use that "block number" for the file
        ID.  Since file ID numbers (a.k.a. inodes) are 32 bit,
        that algorithm will overflow above 32 * 2^32 = 128 GB.
        If you try to mount a FAT file system larger than
        128 GB, it will fail and print "disk too big, sorry".

   -2-  (With MSDOSFS_LARGE in the kernel)  Maintain a table
        that dynamically maps 64bit offsets (that are computed
        like above) to 32bit ID numbers.  This works for FAT
        file systems of any size > 128 GB (the code falls back
        to algorithm 1 for file systems < 128 GB).
        Two drawbacks:

        -A- If a large number of files is accessed, the table
            will grow very big and consume much kernel memory.
            It is possible that the machine panics when it
            runs out of kernel memory.
        -B- Since, the mapping is dynamic, file ID numbers may
            be different when the file system is unmounted and
            re-mounted.  That will break NFS exports, because
            NFS assumes that file ID numbers (which are used
            for NFS handles) are constant.

        It should be noted that those drawbacks only apply if
        the file system is > 128 GB.  For smaller file systems
        the code will automatically use the simpler algorithm
        described first.  This is controlled by the flag
        MSDOSFS_LARGEFS (different from MSDOSFS_LARGE!).

>How-To-Repeat:

   Try to mount FAT file systems of various sizes and encounter
   the situations mentioned above.

   For testing and experimenting, you can easily use a md(4)
   device and newfs_msdos(8) to create a 160 GB FAT disk:

   # truncate -s 160000000000 testfat.img
   # mdconfig -a -t vnode -f testfat.img
   md1
   # fdisk -BI /dev/md1
   ******* Working on device /dev/md1 *******
   # newfs_msdos -s 312496317 -c 128 -h 254 -u 63 /dev/md1s1 orb
   /dev/md1s1: 312458112 sectors in 2441079 FAT32 clusters (65536 bytes/cluster)
   bps=512 spc=128 res=32 nft=2 mid=0xf0 spt=63 hds=254 hid=0 bsec=312496317 bspf=19071 rdcl=2 infs=1 bkbs=2
   # mount -t msdos -o ro /dev/md1s1 /mnt
   mount_msdosfs: /dev/md1s1: Invalid argument
   # dmesg | tail -1
   mountmsdosfs(): disk too big, sorry

   (Note:  The newfs_msdos command is not very fast.  It will
   take a few seconds.)

>Fix:

   Unfortunately, there is no real fix known for the problem.

   However, the problem is made worse by the fact that you
   have to recompile your kernel and reboot in order to
   enable the second hack (kernel option MSDOSFS_LARGE).

   That aspect of the problem could be fixed by making
   it a mount option instead of a kernel compile option,
   essentially converting the #ifdef's to regular if's.

   It has been considered to even enable the MSDOSFS_LARGE
   code by default.  However, because of the drawbacks (i.e.
   possibility of a panic because of kernel memory usage, and
   inability to NFS-export the file system) it should only be
   used if specifically requested by the user.

>Release-Note:
>Audit-Trail:
>Unformatted: