N-way mirror read speedup in zfsonlinux

Thu Aug 15 20:36:13 UTC 2013

The platform-independent code in the patch looks good to me with one
exceptions:

  mm->mm_children = c--;

It's non-obvious why c should be decremented *here*, as opposed to as part
of the loop logic.  If there's some trick going on with the loop then it's
lost on me.  Does it matter that we go through the children in reverse
order?  Is there something wrong with the existing loop logic:

  for (c = 0; c < mm->mm_children; c++) {

--matt

On Thu, Aug 15, 2013 at 1:00 PM, Steven Hartland <killing at multiplay.co.uk>wrote:

> Replies to everyone's individual feedback below:-
>
>
> ----- Original Message ----- From: "Alexander Motin" <mav at FreeBSD.org>
>
>
>  On 14.08.2013 16:50, Steven Hartland wrote:
>>
>>> So based on mav's comments I've created a new version of the load
>>> balanced
>>> patch which takes into account the last IO's locality on the disk.
>>>
>>
>> I like it. Thank you!
>>
>> Just a few comments:
>> 1) I guess your change to scsi_da.c will send "GEOM::rotating" attribute
>> change every time disk is probed (including each open) even when nothing
>> really changed. It is not fatal, but absolutely unneeded overhead.
>>
>
> No problem, change notifications are now guarded by a current value check.
>
>
>  2) You are always giving full locality bonus to SSDs. It means that HDDs
>> quite likely won't be used even for sequential read operations until SSDs
>> are really busy, while that is the only point where HDDs still could if not
>> compete but at least be useful. Surely it very much depend on specific
>> devices characteristics, but I would try to give SSD half (or may be 3/4),
>> so that precisely positioned HDD could still do something it can.
>>
>
> I tested this case with the 2 x HDD's and 1 x SDD case and it results in
> very good balancing with all disks @ ~95% load with 3 threads e.g.
>
> dT: 1.001s  w: 1.000s  filter: ada[123]p1
> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    %busy Name
>    4   1730   1730 219936    2.5      0      0    0.0    92.5| ada1p1
>    3    274    274  34659   10.5      0      0    0.0    91.1| ada2p1
>    5    266    266  33758   10.9      0      0    0.0    94.4| ada3p1
>
> Reducing this to 1 thread does put all load to the SSD in my setup
> but this still results in signficantly higher performance than the
> current configuration but asll case where the HD's take a small portion
> of the load.
>
> That said I've reworked the way loads are calculating adding in more
> configuration options so it can be turned to how the user wants.
>
>
>  3) bde@ would argue about too long sysctl comments, while for me at
>> least first is quite tangled.
>>
>
> Matt already suggested changing it to a seek / latency penalty which
> having played around I also believe is much clearer :)
>
> @matt: Is this the sort of thing you had in mind with regards penalty /
> increment instead of bonus?
>
>
>  4) I don't very like idea of expanding struct disk every time (especially
>> not at the end). While that is probably easier then alternatives, now we
>> have d_getattr method specifically to avoid that.
>>
>
> I see what you mean but it needs to be stored somewhere otherwise
> we're going to have to query the device every time or did you have
> something else specific in mind?
>
>
>  One more minor thought: I guess with the proposed code all requests
>> without locality bonus in case of equal load will go to the first disk.
>> Probably not a big problem, but may be adding small (less then load of 1)
>> random factor (like one from offset in original code) could not be bad from
>> drives wearing point.
>>
>
> I've re-implemented and optimised version from zfsonlinux based on
> zio_offset instead of time, which should ensure this doesn't happen.
>
> From: "Matthew Ahrens" <mahrens at delphix.com>
>
>  > It was also added for testing convenience as much as anything, however
>> all
>> > being well it could be removed unless there's objections to that?
>>
>> I'm fine with removing the old algorithm and the switch
>> (zfs_mirror_by_load).
>>
>
> Ok I've removed this, which does simply the code some what :)
>
> Updated patch attached.
>
>
>    Regards
>    Steve
>
> ==============================**==================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and
> the person or entity to whom it is addressed. In the event of misdirection,
> the recipient is prohibited from using, copying, printing or otherwise
> disseminating it or any information contained in it.
> In the event of misdirection, illegible or incomplete transmission please
> telephone +44 845 868 1337
> or return the E.mail to postmaster at multiplay.co.uk.
>