gmirror 'load' algorithm (Was: Re: siis/atacam/ata/gmirror 8.0-BETA3 disk performance)

Thu Dec 3 12:53:52 UTC 2009

Maxim Sobolev wrote:
> Alexander Motin wrote:
>> I have played a bit with this patch on 4-disk mirror. It works better
>> then original algorithm, but still not perfect.
>>
>> 1. I have managed situation with 4 read streams when 3 drives were
>> busy, while forth one was completely idle. gmirror prefer constantly
>> seek one of drives on short distances, but not to use idle drive,
>> because it's heads were few gigabytes away from that point.
>>
>> IMHO request locality priority should be made almost equal for any
>> nonzero distances. As we can see with split mode, even small gaps
>> between requests can significantly reduce drive performance. So I
>> think it is not so important if data are 100MB or 500GB away from
>> current head position. It is perfect case when requests are completely
>> sequential. But everything beyond few megabytes from current position
>> just won't fit drive cache.
>>
>> 2. IMHO it would be much better to use averaged request queue depth as
>> load measure, instead of last request submit time. Request submit time
>> works fine only for equal requests, equal drives and serialized load,
>> but it is actually the case where complicated load balancing is just
>> not needed. The fact that some drive just got request does not mean
>> anything, if some another one got 50 requests one second ago and still
>> processes them.
> 
> Can you try this one:
> 
> http://sobomax.sippysoft.com/~sobomax/geom_mirror.diff
> 
> It implements different logic - instead of looking for the time, it
> checks the outstanding requests queue length and recently served
> requests proximity to decide where to schedule requests.

Your patch changes "round-robin" algorithm, instead of "load". I have
reimplemented it for "load" algorithm and changed math a bit, as I have
written before. Patch is here:
http://people.freebsd.org/~mav/gmirror.patch

Here is some benchmarks for gmirror of 4 drives:

### load original
             linear 1MB read     random
1 process       MBps: 101       tps: 161
2 processes     MBps: 78        tps: 265
4 processes     MBps: 90        tps: 325
8 processes     MBps: 101       tps: 384
16 processes    MBps: 118       tps: 426
32 processes    MBps: 142       tps: 457

Random performance is not bad, but linear is terrible, as requests
jumping between drives and kicking each-other.

### round-robin
             linear 1MB read     random
1 process       MBps: 64        tps: 158
2 processes     MBps: 131       tps: 260
4 processes     MBps: 235       tps: 342
5 processes     MBps: 240       tps: 362
8 processes     MBps: 239       tps: 397
16 processes    MBps: 246       tps: 432
32 processes    MBps: 258       tps: 452

This is completely predictable. Random is fine, linear is not really
linear. Perfect requests balancing between drives.

### load mav@
             linear 1MB read     random
1 process       MBps: 104       tps: 159
2 processes     MBps: 214       tps: 256
4 processes     MBps: 425       tps: 332
5 processes     MBps: 300       tps: 352
8 processes     MBps: 245       tps: 391
16 processes    MBps: 255       tps: 436
32 processes    MBps: 263       tps: 457

Random is close to round-robin. Request balancing is close to perfect.
Linear shows maximum possible performance for number of processes up to
the number of drives, using only as much disks as needed. With more
processes then disks, performance predictably reducing, but still beats
all other methods.

I think it is hardly possible to get much more.

-- 
Alexander Motin