Very inconsistent (read) speed on UFS2

Jeremy Chadwick freebsd at jdc.parodius.com
Wed Aug 31 00:42:56 UTC 2011


On Tue, Aug 30, 2011 at 11:18:15PM +0400, Lev Serebryakov wrote:
>    Now, when I "defragmented" my large FS, I see very inconsistent
>  read speeds on same files. Is it Ok?
> 
>   My setup is:
> 
>  (1) FreeBSD 8.2-STABLE/x64
>  (2) E4400 CPU, 2GiB RAM
>  (3) 5xHDDs in RAID5 (software), controller is ICH9R.
>  (4) UFS2 with 32KiB block, vfs.read_max=32 (1MiB read-ahead).
>  (5) System and swap on another (6th) HDD, but swap is unused.
>  (6) No periodic or background processes access FS in question at all.
> 
>  Simple program reads each of 12 files (460MiB each) 15 times in cycle
>  like 01, 02, ..., 12, 01,... so, cache in memory should be thrashed,
>  as reading process returns to same data every ~5.5GiB and here are
>  only 2GiB physical memory in system.
> 
>  And speed of these reads are VERY inconsistent. I've calculated
> min/average/max and standard deviation and results are like this:
> 
> Name        Min/Avg/Max       StdDev
> r012f02.nef 120/235/413 MiB/s     83
> r012f09.nef 154/248/393 MiB/s     80
> r012f12.nef 106/212/293 MiB/s     63
> r012f05.nef  86/206/280 MiB/s     62
> r012f08.nef 128/223/332 MiB/s     60
> r012f11.nef 155/257/327 MiB/s     56
> r012f03.nef 121/213/279 MiB/s     52
> r012f10.nef 120/226/284 MiB/s     45
> r012f07.nef 121/199/249 MiB/s     41
> r012f01.nef 135/199/242 MiB/s     33
> 
>   It is results from 15 runs! One time file was read at sustained
> average speed 120MiB/s (~3.8 seconds) and next time it was 413MiB/s
> (only ~1.1 second!)
> 
>   And it is not case when first read is slowest. No. Sometimes last
> one is slowest, for example.
> 
>   Is it Ok? I'm very disappointed to see 120MiB/s when I know that
>  hardware can give 415MiB/s, but something strange slows down the
>  process.

What appears to have been missed here is that there are 5 drives in a
RAID-5 fashion.  Wait, RAID-5?  FreeBSD has RAID-5 support?  How?  Oh,
right...

There's a port called sysutils/graid5 which is a "converted to work on
FreeBSD 8.x" GEOM class for RAID-5.  The original was written for
earlier FreeBSD and was called geom_raid5.  The original that Arne
Worner introduced was written in 2006.  A port was made for it only
recently:

http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/graid5/Makefile

What scares me is the number of "variants" on this code:

http://en.wikipedia.org/wiki/Geom_raid5

Some users have asked why this code hasn't ever been committed to the
FreeBSD kernel (dated 2010, citing "why isn't this in HEAD?"):

http://forums.freebsd.org/showthread.php?t=9040

There are admissions from Arne that "the code is absolutely horrible",
which may be why it's never been committed to FreeBSD.  There's also all
sorts of other concerns:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00437.html

Here's one citing concerns over "aggressive caching", talking about
writes and not reads, but my point still applies:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00398.html
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00403.html

The thread continues for quite some time.

There's also a freebsd-current thread from 2007 asking if the code could
be committed to HEAD, with some users stating they'd like to see that
too -- with one noting that gvinum has support for RAID-5 so basically
"which is better?"  (I imagine that question is still unanswered)

There were also concerns over testing, reliability, throughput, etc. and
the answers (as of 2007) were really not that great:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00351.html
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00361.html

So can I ask what guarantee you have that geom_raid5 is not responsible
for the intermittent I/O speeds you see?  I would recommend you remove
geom_raid5 from the picture entirely and replace it with either
gstripe(8) or ccd(4) SOLELY FOR TESTING.

Furthermore, why are these benchmarks not providing speed data
per-device (e.g. gstat or iostat -x data)?  There is a possibility that
one of your drives could be performing at less-than-ideal rates (yes,
intermittently) and therefore impacts (intermittently) your overall I/O
throughput.

The other posts in this mail thread so far are much more conclusive, but
the above points/concerns I believe are still valid.  They have never
been thoroughly refuted or addressed.  I guess you could say I'm very
surprised someone is complaining about performance issues on FreeBSD
when using a 3rd-party GEOM class that's been scrutinised in the past.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |



More information about the freebsd-fs mailing list