i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem

Sat May 28 03:40:05 PDT 2005

The following reply was made to PR i386/68719; it has been noted by GNATS.

From: Bruce Evans <bde at zeta.org.au>
To: Dominic Marks <dom at goodforbusiness.co.uk>
Cc: freebsd-gnats-submit at FreeBSD.org, banhalmi at field.hu,
   freebsd-fs at FreeBSD.org
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Sat, 28 May 2005 20:36:55 +1000 (EST)

 On Fri, 27 May 2005, Dominic Marks wrote:

 > (Posted to freebsd-fs as the PR is assigned to freebsd-usb@, but it seems to
 > be more related to the msdos filesystem than the USB system so perhaps it
 > should be reassigned?)

 It should be.  It is even less i386-specific than usb-specific.

 > I've been evaluating the performance of some usb2 hard discs with FreeBSD and
 > I found this PR (68719). The submitter is correct that performance with
 > msdosfs is severely limited.
 >
 > I tested a 'LaCie' USB2 disc:
 > ...
 > In test 1 I could not achieve any better than 5.1MB/s on an msdosfs
 > filesystem. Using UFS2 and softupdates a transfer rate of 22~25MB/s was
 > possible. Both test data sets were copied from the systems ATA-100 disc. In
 > both tests at these peaks gstat reports the device is 100% busy.

 I use the following to improve transfer rates for msdosfs.  The patch is
 for an old version so it might not apply directly.

 %%%
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.147
 diff -u -2 -r1.147 msdosfs_vnops.c
 --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 +++ msdosfs_vnops.c	22 Feb 2004 07:27:15 -0000
 @@ -608,4 +622,5 @@
   	int error = 0;
   	u_long count;
 +	int seqcount;
   	daddr_t bn, lastcn;
   	struct buf *bp;
 @@ -693,4 +714,5 @@
   		lastcn = de_clcount(pmp, osize) - 1;

 +	seqcount = ioflag >> IO_SEQSHIFT;
   	do {
   		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
 @@ -718,5 +740,5 @@
   			 */
   			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 -			clrbuf(bp);
 +			vfs_bio_clrbuf(bp);
   			/*
   			 * Do the bmap now, since pcbmap needs buffers
 @@ -767,11 +789,19 @@
   		 * without delay.  Otherwise do a delayed write because we
   		 * may want to write somemore into the block later.
 +		 * XXX comment not updated with code.
   		 */
 +		if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +			bp->b_flags |= B_CLUSTEROK;
   		if (ioflag & IO_SYNC)
 -			(void) bwrite(bp);
 -		else if (n + croffset == pmp->pm_bpcluster)
 +			(void)bwrite(bp);
 +		else if (vm_page_count_severe() || buf_dirty_count_severe())
   			bawrite(bp);
 -		else
 -			bdwrite(bp);
 +		else if (n + croffset == pmp->pm_bpcluster) {
 +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +				cluster_write(bp, dep->de_FileSize, seqcount);
 +			else
 +				bawrite(bp);
 +  		} else
 +  			bdwrite(bp);
   		dep->de_flag |= DE_UPDATE;
   	} while (error == 0 && uio->uio_resid > 0);
 %%%

 Notes:
 - The xxx_count_severe() stuff doesn't work quite right and was observed
    to work especially badly for msdosfs in some configurations.  IIRC,
    only configurations with a tiny block size (e.g., 512 bytes) showed
    the problem, and the problem is more likely to be with tiny block sizes
    actually exercising the "severe" case than with msdosfs or with the
    tiny block sizes themselves.  The behaviour was apparently that when
    a severe page or buf shortage develops, the above handling makes the
    problem worse by using bawrite() instead of cluster_write().  Falling
    back to bawrite() may have made the resource shortage non-fatal, but
    it made the resource shortage last much longer since bawrite() was much
    slower, even on the reasonable fast ATA drive that I was testing on.
 - Using cluster_write() in the above is not essential.  bdwrite() works
    almost as well, or perhaps even better than cluster_write() provided
    write clustering is enabled by setting B_CLUSTEROK, since when this
    flag is set the delayed writes are clustered when they are done
    physically.

 > I have not made any tests of read performance but from looking at the results
 > I do not expect that it will be significantly better than write performance.
 > I may do some when I get more time to investigate and follow up if the
 > results are unexpected.

 Try it.  I would expect read performance to be much better.  If not, don't
 bother trying the above patch.  msdosfs uses read-ahead for read(), and
 this seems to work well so I haven't even tried changing it to use read
 clustering (the above only changes it to use write clustering).  This may
 depend on the drive doing read caching and not handling small block sizes
 too badly.  I mostly use ATA drives that have these properties.  Writing
 tinygrams tends to have a relatively higher cost because write caching is
 not enabled so clustering can only be done by the OS.

 > Hopefully this will generate some interest in the problem, it is beyond my
 > time and expertise but it would be very nice to be able to access MS-DOS
 > formatted filesystems at a reasonable speed!

 Some other changes are needed for general use at a reasonable speed:
 - use VMIO for metadata.
 - don't use pessimal block allocation.  The current allocator gives
    large inter-file fragmentation by attempting to minimise intra-file
    fragmentation, and when the file system becomes just 1/N full the
    attempt backfires and gives intra-file fragmentation too (files with
    more than N clusters are very likely to be fragmented).

 Bruce