performance problem with gstripe
Ivan Voras
ivoras at freebsd.org
Wed Jan 7 18:54:14 UTC 2009
Joel Jacobson wrote:
> still works badly at 64k, but works well if i use 32k (and have
> kern.geom.stripe.fast=1). that being said, i was only seeing 64k I/O
> through ufs when i was doing the 256k stripe, so im still not sure why
> this matters.
You have robably stumbled on the group of problems collectively knows as
"the MAXPHYS problem". This is what's happening: Many disk drivers in
FreeBSD were first created when the controllers and the motherboards
didn't support DMA larger than 64 kB. In addition to that there's a hard
limit on IO request sizes set to 128 kB (the MAXPHYS kernel option) but
which is not often reached. Thus, the maximum IO size that can reach a
single drive is 64 kB and this limit is propagated in unclear ways back
to UFS. If you have a stripe size larger or equal to then 64 kB then in
no way can the IO request be split between two drives - you get the
performance of a single drive. If the stripe size is smaller, the IO
request can be split between the drives and you get better performance.
All this discussion maps 1:1 to the "dd" utility accessing the raw
device (/dev/something). In FreeBSD, raw device access is not buffered,
so what the dd requests, the drive delivers, in exactly the same way it
was requested, chopped into 64 kB pieces if needed.
The reason why UFS is better is that it asynchronously fills a queue
(bioq) with requests, which are sent to the device in the same way,
asynchronously, so even if a single write cannot span multiple stripes,
there will be many writes queued which can be done in parallel. This
works upto a point, and still breaks down for high loads, large number
of devices, really large stripe sizes etc.
The problem is annoying but not serious if you know about it. It limits
the sequential performance, but if you'd tried a random IO benchmark
that can do parallel IO itself (try http://arctic.org/~dean/randomio/)
on the device and uses small-ish block sizes, you'd probably find that
you still get better performance.
> i have a somewhat hidden agenda here, too, in that i have my own
> filesystem that suffers the same problem im seeing with dd. i figured
I'm interested in file systems so I'd be happy to test it for you. :)
> there was something ufs does which i do not, and was trying to figure
> out what that might be. it works fine on 4.6.2 using ccd and a 256k
> stripe size [and i send 128k I/O requests, which is what i would prefer
> to see sent to the driver, rather than 64k].
I don't know how CCD works - maybe it can queue IO in parallel? Maybe
4.x still had cached block devices (they were thrown out at some point
in time but I don't know when - see
http://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html)?
I think there were so many changes in between 4.x and 8-CURRENT that
you'll need to find someone who has worked specifically on VFS to
explain exactly what is going on. Contact me if you need pointers.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20090107/f34a99c2/signature.pgp
More information about the freebsd-geom
mailing list