ext2fs now extremely slow

Aditya Sarawgi sarawgi.aditya at gmail.com
Wed Sep 29 04:43:21 UTC 2010


On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote:
> On Wed, 29 Sep 2010, Bruce Evans wrote:
> 
> > On Wed, 29 Sep 2010, Bruce Evans wrote:
> >
> >> For benchmarks on ext2fs:
> >> 
> >> Under FreeBSD-~5.2 rerun today:
> >> untar:     59.17 real
> >> tar:       19.52 real
> >> 
> >> Under -current run today:
> >> untar:    101.16 real
> >> tar:      172.03 real
> >> 
> >> So, -current is 8.8 times slower for tar, but only 1.7 times slower for
> >> untar.
> >> ...
> >> So it seems that only 1 block in every 8 is used, and there is a seek
> >> after every block.  This asks for an 8-fold reduction in throughput,
> >> and it seems to have got that and a bit more for reading although not
> >> for writing.  Even (or especially) with perfect hardware, it must give
> >> an 8-fold reduction.  And it is likely to give more, since it defeats
> >> vfs clustering by making all runs of contiguous blocks have length 1.
> >> 
> >> Simple sequential allocation should be used unless the allocation policy
> >> and implementation are very good.
> >
> > This work a bit better after zapping the 8-fold way:
>    Things
> > ...
> > This gives an improvement of:
> >
> > untar:    101.16 real -> 63.46
> > tar:      172.03 real -> 50.70
> >
> > Now -current is only 1.1 times slower for untar and 2.6 times slower for
> > tar.
> >
> > There must be a problem with bpref for things to have been so bad.  There
> > is some point to leaving a gap of 7 blocks for expansion, but the gap was
> > left even between blocks in a single file.
> > ...
> > I haven't tried the bde_blkpref hack in the above.  It should kill bpref
> > completely so that there is no jump between lbn0 and lbn1, and break
> > cylinder group based allocation even better.  Setting bde_blkpref to 1
> > restores the bug that was present in ext2fs in FreeBSD between 1995 and
> > 2010.  This bug gave seqential allocation starting at the beginning of
> > the disk in almost all cases, so map searches were slow and early groups
> > filled up before later groups were used at all.
> 
> Tried this (patch repeated below), and it gave essentially the same
> speed as old versions.
> 
> The main problem seems to be that the `goal' variables aren't initialized.
> After restoring bits verbatim from an old version, things seem to work as
> expected:
> 
> % Index: ext2_alloc.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v
> % retrieving revision 1.2
> % diff -u -2 -r1.2 ext2_alloc.c
> % --- ext2_alloc.c	1 Sep 2010 05:34:17 -0000	1.2
> % +++ ext2_alloc.c	28 Sep 2010 21:08:42 -0000
> % @@ -1,2 +1,5 @@
> % +int bde_blkpref = 0;
> % +int bde_alloc8 = 0;
> % +
> %  /*-
> %   *  modified for Lites 1.1
> % @@ -117,4 +120,8 @@
> %                                                   ext2_alloccg);
> %          if (bno > 0) {
> % +		/* set next_alloc fields as done in block_getblk */
> % +		ip->i_next_alloc_block = lbn;
> % +		ip->i_next_alloc_goal = bno;
> % +
> %                  ip->i_blocks += btodb(fs->e2fs_bsize);
> %                  ip->i_flag |= IN_CHANGE | IN_UPDATE;
> 
> The only things that changed recently in this block were the 4 deleted
> lines and 4 lines with tabs corrupted to spaces.  Perhaps an editing
> error.
> 
> % @@ -542,6 +549,12 @@
> %  	   then set the goal to what we thought it should be
> %  	*/
> % +if (bde_blkpref == 0) {
> %  	if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0)
> %  		return ip->i_next_alloc_goal;
> % +} else if (bde_blkpref == 1) {
> % +	if(ip->i_next_alloc_block == lbn)
> % +		return ip->i_next_alloc_goal;
> % +} else
> % +	return 0;
> % 
> %  	/* now check whether we were provided with an array that basically
> 
> Not needed now.
> 
> % @@ -662,4 +675,5 @@
> %  	 * block.
> %  	 */
> % +if (bde_alloc8 == 0) {
> %  	if (bpref)
> %  		start = dtogd(fs, bpref) / NBBY;
> % @@ -679,4 +693,5 @@
> %  		}
> %  	}
> % +}
> % 
> %  	bno = ext2_mapsearch(fs, bbp, bpref);
> 
> The code to skip to the next 8-block boundary should be removed permanently.
> After fixing the initialization, it doesn't generate holes inside files but
> it still generates holes between files.  The holes are quite large with
> 4K-blocks.
> 
> Benchmark results with just the initialization of `goal' variables restored:
> 
> %%%
> ext2fs-1024-1024:
> tarcp /f srcs:                 78.79 real         0.31 user         4.94 sys
> tar cf /dev/zero srcs:         24.62 real         0.19 user         1.82 sys
> ext2fs-1024-1024-as:
> tarcp /f srcs:                 52.07 real         0.26 user         4.95 sys
> tar cf /dev/zero srcs:         24.80 real         0.10 user         1.93 sys
> ext2fs-4096-4096:
> tarcp /f srcs:                 74.14 real         0.34 user         3.96 sys
> tar cf /dev/zero srcs:         33.82 real         0.10 user         1.19 sys
> ext2fs-4096-4096-as:
> tarcp /f srcs:                 53.54 real         0.36 user         3.87 sys
> tar cf /dev/zero srcs:         33.91 real         0.14 user         1.15 sys
> %%%
> 
> The much larger holes between the files are apparently responsible for the
> decreased speed with 4K-blocks.  1K-blocks are really too small, so 4K-blocks
> should be faster.
> 
> Benchmark results with the fix and bde_alloc8 = 1.
> 
> ext2fs-1024-1024:
> tarcp /f srcs:                 71.60 real         0.15 user         2.04 sys
> tar cf /dev/zero srcs:         22.34 real         0.05 user         0.79 sys
> ext2fs-1024-1024-as:
> tarcp /f srcs:                 46.03 real         0.14 user         2.02 sys
> tar cf /dev/zero srcs:         21.97 real         0.05 user         0.80 sys
> ext2fs-4096-4096:
> tarcp /f srcs:                 59.66 real         0.13 user         1.63 sys
> tar cf /dev/zero srcs:         19.88 real         0.07 user         0.46 sys
> ext2fs-4096-4096-as:
> tarcp /f srcs:                 37.30 real         0.12 user         1.60 sys
> tar cf /dev/zero srcs:         19.93 real         0.05 user         0.49 sys
> 
> Bruce

Hi,

I see what you are saying. The gap of 8 block between the files 
is due to the old preallocation which used to allocate additional 
8 blocks in advance for a particular inode when allocating a block
for it. The gap between blocks of the same file shouldn't be there 
too. Both of these cases should be removed. I will look into this 
during this week. The slowness is also due to lack of preallocation
in the new code.

Thanks
Aditya Sarawgi


More information about the freebsd-fs mailing list