N-way mirror read speedup in zfsonlinux

Steven Hartland killing at multiplay.co.uk
Thu Aug 15 19:59:59 UTC 2013


Replies to everyone's individual feedback below:-

----- Original Message ----- 
From: "Alexander Motin" <mav at FreeBSD.org>


> On 14.08.2013 16:50, Steven Hartland wrote:
>> So based on mav's comments I've created a new version of the load balanced
>> patch which takes into account the last IO's locality on the disk.
> 
> I like it. Thank you!
> 
> Just a few comments:
> 1) I guess your change to scsi_da.c will send "GEOM::rotating" attribute 
> change every time disk is probed (including each open) even when nothing 
> really changed. It is not fatal, but absolutely unneeded overhead.

No problem, change notifications are now guarded by a current value check.

> 2) You are always giving full locality bonus to SSDs. It means that HDDs 
> quite likely won't be used even for sequential read operations until 
> SSDs are really busy, while that is the only point where HDDs still 
> could if not compete but at least be useful. Surely it very much depend 
> on specific devices characteristics, but I would try to give SSD half 
> (or may be 3/4), so that precisely positioned HDD could still do 
> something it can.

I tested this case with the 2 x HDD's and 1 x SDD case and it results in
very good balancing with all disks @ ~95% load with 3 threads e.g.

dT: 1.001s  w: 1.000s  filter: ada[123]p1
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    %busy Name
    4   1730   1730 219936    2.5      0      0    0.0    92.5| ada1p1
    3    274    274  34659   10.5      0      0    0.0    91.1| ada2p1
    5    266    266  33758   10.9      0      0    0.0    94.4| ada3p1

Reducing this to 1 thread does put all load to the SSD in my setup
but this still results in signficantly higher performance than the
current configuration but asll case where the HD's take a small portion
of the load.

That said I've reworked the way loads are calculating adding in more
configuration options so it can be turned to how the user wants.

> 3) bde@ would argue about too long sysctl comments, while for me at 
> least first is quite tangled.

Matt already suggested changing it to a seek / latency penalty which
having played around I also believe is much clearer :)

@matt: Is this the sort of thing you had in mind with regards penalty /
increment instead of bonus?

> 4) I don't very like idea of expanding struct disk every time 
> (especially not at the end). While that is probably easier then 
> alternatives, now we have d_getattr method specifically to avoid that.

I see what you mean but it needs to be stored somewhere otherwise
we're going to have to query the device every time or did you have
something else specific in mind?

> One more minor thought: I guess with the proposed code all requests 
> without locality bonus in case of equal load will go to the first disk. 
> Probably not a big problem, but may be adding small (less then load of 
> 1) random factor (like one from offset in original code) could not be 
> bad from drives wearing point.

I've re-implemented and optimised version from zfsonlinux based on
zio_offset instead of time, which should ensure this doesn't happen.

From: "Matthew Ahrens" <mahrens at delphix.com>
> > It was also added for testing convenience as much as anything, however all
> > being well it could be removed unless there's objections to that?
> 
> I'm fine with removing the old algorithm and the switch
> (zfs_mirror_by_load).

Ok I've removed this, which does simply the code some what :)

Updated patch attached.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zfs-mirror-load.patch
Type: application/octet-stream
Size: 13230 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/zfs-devel/attachments/20130815/b4057842/attachment.obj>


More information about the zfs-devel mailing list