Very inconsistent (read) speed on UFS2

Jeremy Chadwick freebsd at jdc.parodius.com
Wed Aug 31 10:12:14 UTC 2011


On Wed, Aug 31, 2011 at 12:36:23PM +0400, Lev Serebryakov wrote:
> Hello, Jeremy.
> You wrote 31 ??????? 2011 ?., 4:42:51:
> 
> > Furthermore, why are these benchmarks not providing speed data
> > per-device (e.g. gstat or iostat -x data)?  There is a possibility that
> > one of your drives could be performing at less-than-ideal rates (yes,
> > intermittently) and therefore impacts (intermittently) your overall I/O
> > throughput.
>   Ok. I've run my benchamrk when `iostat -x -d -c 999999' is running.
>   Results are like this:
> 
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     340.9 292.9 43138.8   146.5    0   1.2  42
> ada2     340.9 293.9 43138.8   147.0    0   1.9  63
> ada3     340.9 292.9 43044.7   146.5    0   1.5  57
> ada4     341.9 292.9 43232.9   146.5    0   1.3  42
> ada5     341.9 292.0 43138.8   146.0    2   1.3  40
>
> {snipping text, focusing on data}
>
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     165.3  87.0 10515.9    43.5    2   5.0  50
> ada2     165.3  87.0 10547.2    43.5    2   7.7  61
> ada3     167.2  87.0 10703.7    43.5    1   6.1  55
> ada4     165.3  87.0 10484.6    43.5    3   4.9  44
> ada5     160.4  87.0 10265.5    43.5    5   5.1  48
>
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     884.1 350.9 56583.1   175.4    0   1.0  49
> ada2     886.1 350.9 56677.2   175.4    0   1.3  58
> ada3     882.2 349.9 56489.0   175.0    2   1.7  63
> ada4     885.1 350.9 56614.5   175.4    0   1.4  64
> ada5     887.1 350.9 56739.9   175.4    0   1.5  63
>
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     640.6 261.5 41001.3   130.8    0   0.9  40
> ada2     639.7 261.5 40969.9   130.8    0   0.9  35
> ada3     637.7 262.5 40844.5   131.3    0   1.5  46
> ada4     640.6 260.6 41001.3   130.3    1   1.3  65
> ada5     638.7 261.5 40875.9   130.8    0   1.3  46
>
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     243.7 102.8 15660.2    51.4    2   1.9  36
> ada2     240.8 102.8 15503.6    51.4    3   1.9  43
> ada3     242.7 103.7 15566.2    51.9    0   1.9  30
> ada4     244.7 103.7 15785.5    51.9    2   2.4  56
> ada5     243.7 102.8 15566.2    51.4    2   1.8  30

This benchmark data is more or less unhelpful due to the fact that there
are writes occurring during the middle of your reads.  There's another
spun-off portion of this thread that is discussing how you're
benchmarking these things (specifically some code you wrote?).  I don't
know what else to say in this regard.  It would really help if you could
use something like bonnie++ and make sure the filesystem is not being
used by ANYTHING during your benchmarks.

Anyway, the data is interesting because from an aggregate total
perspective, you're hitting some arbitrary limit on all of your devices
which almost indicates memory bus throttling or something along those
lines; CPU time?  I really don't know.  Aggregate write speeds
respectively:

43138.8 + 43138.8 + 43044.7 + 43232.9 + 43138.8 == 215694.0 KByte/sec
10515.9 + 10547.2 + 10703.7 + 10484.6 + 10265.5 ==  52516.9 KByte/sec
56583.1 + 56677.2 + 56489.0 + 56614.5 + 56739.9 == 283103.7 KByte/sec
41001.3 + 40969.9 + 40844.5 + 41001.3 + 40875.9 == 204692.9 KByte/sec
15660.2 + 15503.6 + 15566.2 + 15785.5 + 15566.2 ==  78081.7 KByte/sec

The totals are "all over the place", but what interests me the most is
that the total aggregate never exceeds an amount that's slightly under
300MBytes/sec..  That number has some relevance if, say, you're using a
port multiplier (5 devices aggregated across one SATA300 port).

Despite these being WD20EARS drives (4 platters, ugh!), these individual
devices should be able to push 75-90MBytes/sec writes, and slightly
higher reads.

Like you, it also interests me that all the drives behave the same;
meaning all speeds are roughly the same on all 5 devices simultaneously,
regardless of speed/rate/throughput.

Here's an idea: can you stop using the filesystem for a bit and instead
do raw dd's from all of the /dev/adaX entries to /dev/null
simultaneously (pick something like bs=64k or bs=256k), then run your
iostats?  I'm basically trying to figure out if the bad speeds are
actually the devices themselves or if it's the geom_raid5 stuff.  You
get where I'm going with this.

If 5 simultaneously dds reading from the drives is very fast (way faster
than the above) and there aren't sporadic drops in performance which
aren't caused by writes (hence my "stop using the filesystem" comment),
then I think we've narrowed down where the issue lies -- not the drives.

> 1) benchmark induce some writing. atime modification? No, I've turned
>    this one off, but it doesn't help. I afraid, that this read-write
>    interleaving could be cause of "problems", but I don't understand,
>    WHY here is some writing (1 writing per 2 reads in average) when
>    read-only benchmark runs. It doesn't write any logs, etc. Yes,
>    writing speed is very low, every write transaction is about 2Kb,
>    but WHY they are here?! If I stop benchmark, here will be less than
>    1 write transaction per second.

(Note: I'm going to assume by "Kb" you mean "kilobytes" and not
"kilobits"; B = byte, b = bit.  This is why I got into the habit of just
writing out the unit in full, because too many people try to shorthand
it and pick the wrong one.  And it'll be a cold day in hell before I
ever use "XXbi" (e.g. kibi, mebi, gibi, tebi))

The dd method I describe should absolutely not induce writes, hence my
recommendation.  If writes are seen during the dd's, then either the
filesystem is mounted and FreeBSD is doing something "interesting" on a
filesystem or vfs level, or your system is actually an izbushka.....

Maybe softupdates are somehow responsible?  Not sure.

> 2) without `-x' it shows, that typical read transaction size is
>    about 50Kb. It is very strange, as geom_raid5 shows (I have
>    diagnostics in it), that almost all file access is aligned and is
>    128Kb-sized...

I'm not sure -- please take what I say here with a grain of salt -- but
I believe there was a recent discussion on -stable or -fs about some
sort of 64KByte "limit" within UFS/UFS2 somewhere?  I think I'm thinking
of "MAX_BSIZE".  I'm having a lot of difficulty following all these
storage-related threads.  Everyone seems to show up "in bulk" on the
mailing lists all at once and it's overwhelming at times.  I'm getting
old, in more ways than one.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |



More information about the freebsd-fs mailing list