ext2fs now extremely slow
John Baldwin
jhb at freebsd.org
Wed Sep 29 13:26:06 UTC 2010
On Wednesday, September 29, 2010 12:16:55 am Aditya Sarawgi wrote:
> On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote:
> > On Wed, 29 Sep 2010, Bruce Evans wrote:
> >
> > > On Wed, 29 Sep 2010, Bruce Evans wrote:
> > >
> > >> For benchmarks on ext2fs:
> > >>
> > >> Under FreeBSD-~5.2 rerun today:
> > >> untar: 59.17 real
> > >> tar: 19.52 real
> > >>
> > >> Under -current run today:
> > >> untar: 101.16 real
> > >> tar: 172.03 real
> > >>
> > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower for
> > >> untar.
> > >> ...
> > >> So it seems that only 1 block in every 8 is used, and there is a seek
> > >> after every block. This asks for an 8-fold reduction in throughput,
> > >> and it seems to have got that and a bit more for reading although not
> > >> for writing. Even (or especially) with perfect hardware, it must give
> > >> an 8-fold reduction. And it is likely to give more, since it defeats
> > >> vfs clustering by making all runs of contiguous blocks have length 1.
> > >>
> > >> Simple sequential allocation should be used unless the allocation policy
> > >> and implementation are very good.
> > >
> > > This work a bit better after zapping the 8-fold way:
> > Things
> > > ...
> > > This gives an improvement of:
> > >
> > > untar: 101.16 real -> 63.46
> > > tar: 172.03 real -> 50.70
> > >
> > > Now -current is only 1.1 times slower for untar and 2.6 times slower for
> > > tar.
> > >
> > > There must be a problem with bpref for things to have been so bad. There
> > > is some point to leaving a gap of 7 blocks for expansion, but the gap was
> > > left even between blocks in a single file.
> > > ...
> > > I haven't tried the bde_blkpref hack in the above. It should kill bpref
> > > completely so that there is no jump between lbn0 and lbn1, and break
> > > cylinder group based allocation even better. Setting bde_blkpref to 1
> > > restores the bug that was present in ext2fs in FreeBSD between 1995 and
> > > 2010. This bug gave seqential allocation starting at the beginning of
> > > the disk in almost all cases, so map searches were slow and early groups
> > > filled up before later groups were used at all.
> >
> > Tried this (patch repeated below), and it gave essentially the same
> > speed as old versions.
> >
> > The main problem seems to be that the `goal' variables aren't initialized.
> > After restoring bits verbatim from an old version, things seem to work as
> > expected:
> >
> > % Index: ext2_alloc.c
> > % ===================================================================
> > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v
> > % retrieving revision 1.2
> > % diff -u -2 -r1.2 ext2_alloc.c
> > % --- ext2_alloc.c 1 Sep 2010 05:34:17 -0000 1.2
> > % +++ ext2_alloc.c 28 Sep 2010 21:08:42 -0000
> > % @@ -1,2 +1,5 @@
> > % +int bde_blkpref = 0;
> > % +int bde_alloc8 = 0;
> > % +
> > % /*-
> > % * modified for Lites 1.1
> > % @@ -117,4 +120,8 @@
> > % ext2_alloccg);
> > % if (bno > 0) {
> > % + /* set next_alloc fields as done in block_getblk */
> > % + ip->i_next_alloc_block = lbn;
> > % + ip->i_next_alloc_goal = bno;
> > % +
> > % ip->i_blocks += btodb(fs->e2fs_bsize);
> > % ip->i_flag |= IN_CHANGE | IN_UPDATE;
> >
> > The only things that changed recently in this block were the 4 deleted
> > lines and 4 lines with tabs corrupted to spaces. Perhaps an editing
> > error.
> >
> > % @@ -542,6 +549,12 @@
> > % then set the goal to what we thought it should be
> > % */
> > % +if (bde_blkpref == 0) {
> > % if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0)
> > % return ip->i_next_alloc_goal;
> > % +} else if (bde_blkpref == 1) {
> > % + if(ip->i_next_alloc_block == lbn)
> > % + return ip->i_next_alloc_goal;
> > % +} else
> > % + return 0;
> > %
> > % /* now check whether we were provided with an array that basically
> >
> > Not needed now.
> >
> > % @@ -662,4 +675,5 @@
> > % * block.
> > % */
> > % +if (bde_alloc8 == 0) {
> > % if (bpref)
> > % start = dtogd(fs, bpref) / NBBY;
> > % @@ -679,4 +693,5 @@
> > % }
> > % }
> > % +}
> > %
> > % bno = ext2_mapsearch(fs, bbp, bpref);
> >
> > The code to skip to the next 8-block boundary should be removed permanently.
> > After fixing the initialization, it doesn't generate holes inside files but
> > it still generates holes between files. The holes are quite large with
> > 4K-blocks.
> >
> > Benchmark results with just the initialization of `goal' variables restored:
> >
> > %%%
> > ext2fs-1024-1024:
> > tarcp /f srcs: 78.79 real 0.31 user 4.94 sys
> > tar cf /dev/zero srcs: 24.62 real 0.19 user 1.82 sys
> > ext2fs-1024-1024-as:
> > tarcp /f srcs: 52.07 real 0.26 user 4.95 sys
> > tar cf /dev/zero srcs: 24.80 real 0.10 user 1.93 sys
> > ext2fs-4096-4096:
> > tarcp /f srcs: 74.14 real 0.34 user 3.96 sys
> > tar cf /dev/zero srcs: 33.82 real 0.10 user 1.19 sys
> > ext2fs-4096-4096-as:
> > tarcp /f srcs: 53.54 real 0.36 user 3.87 sys
> > tar cf /dev/zero srcs: 33.91 real 0.14 user 1.15 sys
> > %%%
> >
> > The much larger holes between the files are apparently responsible for the
> > decreased speed with 4K-blocks. 1K-blocks are really too small, so 4K-blocks
> > should be faster.
> >
> > Benchmark results with the fix and bde_alloc8 = 1.
> >
> > ext2fs-1024-1024:
> > tarcp /f srcs: 71.60 real 0.15 user 2.04 sys
> > tar cf /dev/zero srcs: 22.34 real 0.05 user 0.79 sys
> > ext2fs-1024-1024-as:
> > tarcp /f srcs: 46.03 real 0.14 user 2.02 sys
> > tar cf /dev/zero srcs: 21.97 real 0.05 user 0.80 sys
> > ext2fs-4096-4096:
> > tarcp /f srcs: 59.66 real 0.13 user 1.63 sys
> > tar cf /dev/zero srcs: 19.88 real 0.07 user 0.46 sys
> > ext2fs-4096-4096-as:
> > tarcp /f srcs: 37.30 real 0.12 user 1.60 sys
> > tar cf /dev/zero srcs: 19.93 real 0.05 user 0.49 sys
> >
> > Bruce
>
> Hi,
>
> I see what you are saying. The gap of 8 block between the files
> is due to the old preallocation which used to allocate additional
> 8 blocks in advance for a particular inode when allocating a block
> for it. The gap between blocks of the same file shouldn't be there
> too. Both of these cases should be removed. I will look into this
> during this week. The slowness is also due to lack of preallocation
> in the new code.
One of the GSoC students worked on a patch to add preallocation back to
ext2fs this summer. Would you be interested in reviewing and/or testing
that patch? (I've attached it). Here is his original e-mail:
<quote>
Hi all,
There is a patch in attachment which implements a preallocation
algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010.
This patch implements the in-memory ext2/3 block preallocation algorithm
from reservation window. It uses a RB-tree to index block allocation
request and reserve a number of blocks for each file which has requested
to allocate a block. When a file request to allocate a block, it will
find a block to allocate to this file. When it find the block to
allocate, it will try to allocate a block, which is in the same cylinder
group with inode and is not in other reservation window in RB-tree.
Meanwhile there are some contiguous free blocks after this block. It
uses a data structure to store this block's position and the length of
contiguous free blocks. Then it inserts this data structure into
RB-tree. When this file request to allocate a block again, It will find
corresponding data structure in RB-tree. If it can find, the next free
block will be allocated to this file directly. Otherwise, it will search
a new block again.
I have run some benchmarks to test this algorithm. Please review it in
wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance
is better when the number of threads is smaller than 4. When the number
of threads is greater than 4, the performance can be increased a little.
Please test it.
Thanks and best regards,
lz
</quote>
--
John Baldwin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ext2fs_prealloc.patch
Type: text/x-patch
Size: 28465 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20100929/ded450f4/ext2fs_prealloc.bin
More information about the freebsd-fs
mailing list