svn commit: r289405 - head/sys/ufs/ffs

Bruce Evans brde at optusnet.com.au
Fri Oct 16 06:21:22 UTC 2015


On Fri, 16 Oct 2015, Warner Losh wrote:

> Log:
>  Do not relocate extents to make them contiguous if the underlying drive can do
>  deletions. Ability to do deletions is a strong indication that this
>  optimization will not help performance. It will only generate extra write
>  traffic. These devices are typically flash based and have a limited number of
>  write cycles. In addition, making the file contiguous in LBA space doesn't
>  improve the access times from flash devices because they have no seek time.
>
>  Reviewed by: mckusick@

Actually, making the file contiguous does improve the access time, probably
by relatively more for flash devices, since for fast devices the number of
i/o's per second is a bottleneck and discontiguous files give many more
i/o's per second.

E.g., suppose the block size is 16K and the transfer rate is 1GB/sec.
This requires 64K i/o's per second (iops) to keep up with the data and
many more to keep up with the metadata.  Completely discontiguous files
are limited to this rate.  But clustering of large contiguous files
increases the block size to 128K, so you only need 8K iops to keep up
with the data.

I think turning of reallocation always gives 1 discontiguous block
for medium-sized files, but not many more than that.  That still
doubles ot triples the number of data i/o's for files of size about
128K (1 block is often split into 3 by a seek in the middle).

> Modified: head/sys/ufs/ffs/ffs_alloc.c
> ==============================================================================
> --- head/sys/ufs/ffs/ffs_alloc.c	Fri Oct 16 03:03:04 2015	(r289404)
> +++ head/sys/ufs/ffs/ffs_alloc.c	Fri Oct 16 03:06:02 2015	(r289405)
> @@ -481,9 +481,19 @@ ffs_reallocblks(ap)
> 		struct cluster_save *a_buflist;
> 	} */ *ap;
> {
> +	struct ufsmount *ump;
>
> -	if (doreallocblks == 0)

The correct way to configure this is a mount option, not this sysctl
variable.  I think the variable was only intended for turning off
reallocation when it was buggy.  In 4.4SD-Lite2, this variable wasn't
even private for ffs, and old versions of FreeBSD misused it in ext2fs.

The related sysctl variables noclusterr and noclusterw were misconfigured
similarly in 4.4BSD-Lite2, but FreeBSD fixed this by turning them into
mount options, despite them probably being less important than
doreallocblocks.  I only use them to see if vfs clustering is still
useful.  Unfortunately, it still is in most cases.  It is too complicated,
and too heavyweight.  But its weight is still smaller than more i/o's
for smaller blocks, at least on non-memory disks.  I think the cleanup
was motivated mainly for non-automatic use of the flags on memory disks
in pc98.

Configuration of memory disks in main memory is also badly supported.
I think md(4) still gives double-caching for all types of backing store,
so if iops is not a problem then doreallocblks and cluster[rw] should
be turned off in all cases for md to recover a small part of the loss
from the double-caching, but md doesn't know anything about this.

Oops, actually md does try to avoid the double-caching, but it does
this for all reads (by using IO_DIRECT) for all types of backing store,
and this destroys performance for at least the case of vnode-backed
disks with the vnode on a hard disk.  IO_DIRECT certainly prevents
clustering.  Then if it works as intended to avoid double-caching, it
also gives many more i/o's than necessary if there is a block size
mismatch.  Perhaps 128 times as many for a 64K:512 mismatch (128
reads of different virtual 512-blocks are mapped to 128 reads of the
same physical 64K-block.  IO_DIRECT prevents caching of the physical
block.  The virtual blocks should be clustered into part of 1 128K-
block, but don't seem to be.

Bruce


More information about the svn-src-head mailing list