Are large RAID stripe sizes useful with FreeBSD?

Mon Mar 31 14:28:22 PDT 2008

Ivan Voras wrote:
> On 31/03/2008, Scott Long <scottl at samsco.org> wrote:
>> Ivan Voras wrote:
>>  > Most of new hardware RAID controllers offer stripe sizes of 128K, 256K
>>  > and some also have 512K and 1M stripes. In the simplest case of RAID0 of
>>  > two drives, knowing that the data is striped across the drives and that
>>  > FreeBSD issues IO request of at most 64K, is it useful to set stripe
>>  > sizes to anything larger than 32K? I suppose something like TCQ would
>>  > help the situation but does anyone know how is this situation usually
>>  > handled on the RAID controllers?
>>
>> Large I/O sizes and large stripe sizes only benefit benchmarks and a
>>  narrow class of real-world applications.
> 
> Like file servers on gigabit networks serving large files? :)
> 
>>  Large stripes have the
>>  potential to actually hurt RAID-5 performance since they make it
>>  much harder for the card to a full stripe replacement instead of a
>>  read-modify-xor-write.
> 
> This is logical.
> 
>>  I hate to be all preachy and linux-like and tell you want you need or
>>  don't need, but in all honesty, large i/o's and stripes usually
>>  don't help typical filesystem-based mail/squid/mysql/apache server
>>  apps.  I do have proof-of-concept patches to allow larger I/O's for
>>  selected controllers on 64-bit FreeBSD platforms, and I intend to clean
>>  up and commit those patches in the next few weeks (no, I'm not ready for
>>  nor looking for testers at this time, sorry).
> 
> I'm not (currently) nagging for large IO request patches :) I just
> want to understand what is happening currently if the stripe size is
> 256 kB (which is the default at least on IBM ServeRAID 8k, and I think
> recent CISS controllers have 128 kB), and the OS chops out IO in 64k
> blocks. I have compared Linux performance and FreeBSD performance and
> I can't conclude from that - for FreeBSD it's not like all requests
> (e.g. 4 64 kB requests) go to a single drive at a time, and it's not
> like they always get split.

In FreeBSD, the request has the possibility of getting split up twice, 
once in GEOM, and once in the block layer above GEOM.  In both cases, 
the split requests will get put onto the g_down queue in series as they
are created, and the g_down thread will then pop them off the queue and
send them to the driver in series.  There is no waiting in-between for
the first part of the request to complete before the second part of the 
request will be sent down.

For writes, the performance penalty of smaller I/O's (assuming no RAID-5
effects) is minimal; most caching controllers and drives will batch the
concurrent requests together, so the only loss is in the slight overhead
of the extra transaction setup and completion.  For reads, the penalty
can be greater because the controller/disk will try to execute the first
request immediately and not wait for the second part to be requested,
leading to the potential for extra rotational and head movement delays.
Many caching RAID controllers offer a read-ahead feature to counteract
this.  However, while my testing has shown little measurable benefit to
this, YMMV.

Scott