geom stripe perfomance question
Oles Hnatkevych
don_oles at able.com.ua
Mon Nov 6 12:56:28 UTC 2006
Hello, Oliver and Pawel
You wrote on 6 11 2006 г., 14:04:15:
Oliver, I doubt your words.
"dd" does not read from stripes, it just issues system calls and
functions. It's the task of underlying geom/drivers to read the actual
data. Why do you think "dd" has a bs operand:
bs=n Set both input and output block size to n bytes, superseding the
ibs and obs operands. If no conversion values other than
noerror, notrunc or sync are specified, then each input block is
copied to the output as a single block without any aggregation
of short blocks.
And I set bs=1m.
More to say - the striping has been designed with the icreased
perfomance in mind. That's why we have a kern.geom.stripe.fast sysctl
variable, that has to reorganize the reads just to avoid the problem
you mention, as I understand (right, Pawel?)
Pawel! You were right about the dd's in parallel.
root# dd if=/dev/ad1 of=/dev/null bs=1m count=1000 & dd if=/dev/ad2 of=/dev/null bs=1m count=1000 &
[1] 77476
[2] 77477
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 27.935007 secs (37536271 bytes/sec)
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 28.383332 secs (36943372 bytes/sec)
[1]- Done dd if=/dev/ad1 of=/dev/null bs=1m count=1000
[2]+ Done dd if=/dev/ad2 of=/dev/null bs=1m count=1000
Seems like it's an ATA controller bottleneck.
atapci0 at pci0:31:1: class=0x010180 card=0x24428086 chip=0x244b8086 rev=0x11 hdr=0x00
vendor = 'Intel Corporation'
device = '82801BA (ICH2) UltraATA/100 IDE Controller'
class = mass storage
subclass = ATA
I'll try to do that on another box, not so old, just to find the
truth.
> Oles Hnatkevych wrote:
>> I wonder why geom stripe works much worse than the separate disks that
>> constitute stripe.
> It depends on your workload (or your benchmark).
>> I have a stripe from two disks. Disks are on separate ATA channels.
>> [...]
>> Stripesize: 262144
>> [...]
>> Now let's read one of them and stripe.
>>
>> root# dd if=/dev/ad1 of=/dev/null bs=1m count=1000
>> 1048576000 bytes transferred in 14.579483 secs (71921343 bytes/sec)
>>
>> root# dd if=/dev/stripe/bigdata of=/dev/null bs=1m count=1000
>> 1048576000 bytes transferred in 15.882796 secs (66019610 bytes/sec)
>>
>> What I would expect is doubling the speed of transfer, not
>> slowing down. Am I wrong? Or is geom_stripe inefficient?
>> I tried to do the same with gvinum/stripe - the read
>> speed was degraded too. And with gmirror depending on slice size speed
>> was degraded differently.
> I wonder why people always try to use dd for benchmarking.
> It's bogus. dd is not for benchmarking. It works in a
> sequential way, i.e. it first reads 256 KB (your stripe
> size) from the first compontent, then 256 KB from the 2nd,
> and so on. While it reads from one disk, the other one is
> idle. So it is not surprising that you don't see a speed
> increase (in fact, there's a small decrease because of
> the seek time overhead when switching from on disk to
> the other). [*]
> The performance of a stripe should be better when you use
> applications that perform parallel I/O access.
> Your benchmark should be as close to your real-world app
> as possible. If your real-world app is dd (or another one
> that accesses big files sequentially without parallelism),
> then you shouldn't use striping.
> Best regards
> Oliver
> PS: [*] It could be argued that the kernel could prefetch
> the next 256 KB from the other disk, so both disks are kept
> busy for best throughput. The problem with that is that
> the kernel doesn't know that the next 256 KB will be needed,
> so it doesn't know whether it makes sense to prefetch them
> or not. dd has no way to tell the kernel about its usage
> pattern (it would require an API similar to madvise(2)).
--
Best wishes,
Oles
More information about the freebsd-geom
mailing list