Very inconsistent (read) speed on UFS2

Lev Serebryakov lev at serebryakov.spb.ru
Wed Aug 31 08:03:04 UTC 2011


Hello, Jeremy.
You wrote 31 августа 2011 г., 4:42:51:

> What appears to have been missed here is that there are 5 drives in a
> RAID-5 fashion.  Wait, RAID-5?  FreeBSD has RAID-5 support?  How?  Oh,
> right...

> There's a port called sysutils/graid5 which is a "converted to work on
> FreeBSD 8.x" GEOM class for RAID-5.  The original was written for
> earlier FreeBSD and was called geom_raid5.  The original that Arne
> Worner introduced was written in 2006.  A port was made for it only
> recently:
  I'm author of this port. And I'm author of some improvements,
approved by Arne Worner, which is included into this port :) And it
seems, that I'm only user in whole world of this port, too. But it
works for me for many years without any data-loss problems. It
helps me not to lost data, when I had 3 dead HDDs in these years (not
in simultaneously, of course) and upgrade my server from 5x500Gb to 5x2Tb
configuration without stopping it (ok, with small stop for "growfs"
run, but all HDDs were replaced one-by-one on live system, thaks to
STAT hotplug).
  Now I'm trying to squeeze maximum speed from this software :)

> What scares me is the number of "variants" on this code:
> http://en.wikipedia.org/wiki/Geom_raid5
   There are three wariants dumb proof-of-consept, stable and fast,
 but not ideal code and experimental one. Port uses second one. First
 one is way to slow and third one HAVE problems.

   What scares _me_ is the coding style of Arne. I've spent almost
 year to understand almost all details of this code, mostly due to
 two-letter variables, etc.

> Some users have asked why this code hasn't ever been committed to the
> FreeBSD kernel (dated 2010, citing "why isn't this in HEAD?"):
> http://forums.freebsd.org/showthread.php?t=9040
   Code style. And I mean real problems, not some nit-picking about
 "return 0;" vs "return (0);" or white spaces. I'm trying to clean up
it in separate branch, without changing functionality, before I'll
implement some new ideas, which should cleannup code even more. But it
is not very fast process, as I don't have a lot of spare time now, and
it is work which takes A LOT of concentration.

> Here's one citing concerns over "aggressive caching", talking about
> writes and not reads, but my point still applies:
> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00398.html
> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00403.html
  Yep, and this aggressive caching could be turned off. But it is
GREAT help on write speed. Use good UPS and nut -- they really HELP. And,
other note: without UPS and nut even without geom_raid5 here is BIG
problem with large volumes and UFS2. Background ffsck for 2Tb volume
takes about three hours, when system almost locked,  and fails often.
fsck of 8Tb volume? It is my worst nightmare. And it doesn't depend on
RADI5 and it write cache. Use UPS. USE IT.

> So can I ask what guarantee you have that geom_raid5 is not responsible
> for the intermittent I/O speeds you see?  I would recommend you remove
  I'm not sure here -- it is the point. I want to understand, is it
geom_raid5 problem, UFS2 problem, VMM problem or some combination of
``glithces'' of these subsystems. I'm almost sure, it is not problem
of something ``in vacuum,'' it is problem at border between
subsystems. And, as I don't understand well how to "look inside" UFS2,
I ask for help here.

> geom_raid5 from the picture entirely and replace it with either
> gstripe(8) or ccd(4) SOLELY FOR TESTING.
  It is impossible in this config: I have data which is valuable for
 me. Here is problem: I could do any tests, but speed one, on test
 server and VMs. I could run testsuite, switch off HDDs, re-create
 FSes, etc., to be sure that geom_raid5 is STABLE in terms of data
 safety.
   But only BIG system, on which I could perform valid speed
 benchmarks, is my home server with my data, which I could not lost.

   It is useless to run such benchmarks on array of old 9GiB (yes, you read it
 right, 9 gigabytes) SCSI HDDs or in virtual machine with bunch of
 virtual HDDs. And I have not second server with modern fast and big
 disks. Sorry.

> Furthermore, why are these benchmarks not providing speed data
> per-device (e.g. gstat or iostat -x data)?  There is a possibility that
> one of your drives could be performing at less-than-ideal rates (yes,
> intermittently) and therefore impacts (intermittently) your overall I/O
> throughput.
   I'll look at this, but I've zeor-outed all HDDs before placing them
 into array, and speed were identical.

> been thoroughly refuted or addressed.  I guess you could say I'm very
> surprised someone is complaining about performance issues on FreeBSD
> when using a 3rd-party GEOM class that's been scrutinised in the past.
  It is not complain. It is request for help in profiling very old and
 complex subsystem :) Maybe, I was not very clear here in my first
 message.

-- 
// Black Lion AKA Lev Serebryakov <lev at serebryakov.spb.ru>



More information about the freebsd-fs mailing list