ffs_alloc.c: minfree Q

Wed Aug 10 11:12:00 GMT 2005

On 10 Aug, Dmitry Morozovsky wrote:
> Colleagues,
> 
> 
> from ffs_alloc.c:
> 
>         case FS_OPTSPACE:
>                 /*
>                  * Allocate an exact sized fragment. Although this makes
>                  * best use of space, we will waste time relocating it if
>                  * the file continues to grow. If the fragmentation is
>                  * less than half of the minimum free reserve, we choose
>                  * to begin optimizing for time.
>                  */
>                 request = nsize;
>                 if (fs->fs_minfree <= 5 ||
>    --->>>        ~~~~~~~~~~~~~~~~~~~~~~
>                     fs->fs_cstotal.cs_nffree >
>                     (off_t)fs->fs_dsize * fs->fs_minfree / (2 * 100))
>                         break;
>                 log(LOG_NOTICE, "%s: optimization changed from SPACE to TIME\n",
>                         fs->fs_fsmnt);
>                 fs->fs_optim = FS_OPTTIME;
>                 break;
> 
> For contemporary situation, where total size of file system can grow to 
> hundreds of Gs or even several Ts, 8% of space seems too high. 
> 
> Maybe this algorithm should be slightly adjusted (I'm thinking of logarithmic 
> scale depending on file system size)? 

I experimented with this back when I ran a Usenet server with a classic,
one article per file, spool.  If found that if I pushed the limit, I'd
often lose the ability to create files that were greater than or equal
to the file system block size because all of the free space consisted of
partial blocks that had one or more fragments allocated.  This would
happen even though df said the file system still had plenty of free
space.

The severity of this problem depends on the file size distribution and
its relationship to the file system block and fragment sizes, and
doesn't depend on the file system size.  If you double the size of the
file system, you can double the number of files stored before you run
into the problem, and you run into the problem at the about same
percentage of fullness no matter what the size of the file system.

You can avoid this problem if you set the fragment size the same as the
block size when you create the file system, but then the wasted space is
just hidden in the partially filled blocks at the end of each file,
where it is invisible to df.  This is similar to the behaviour of other
types of file systems that only have one allocation unit size.

Another problem that you are likely to run into if you run file systems
very nearly full is that eventually sequential I/O performance on larger
files tends to get very bad over time because the blocks contained in
each file get spread all over the disk, requiring a large number of
seeks to access them all. The number of contigous free blocks and the
distance between free blocks is going to depend on the percentage of
fullness and not the size of the file system.  If you have two file
systems of size N that are X% full, the distribution of the free space
in each file system and the I/O performance will be very similar to one
file system of size 2N that is also X% full.

A special case where cranking down minfree ok is when you have a
static set of a small number of large files that are created shortly
after the file system is newfs'ed so that the blocks allocated to each
file are largely contiguous.  Re-writing these files is even ok as long
as they are re-written in place and not truncated and re-extended.