N-way mirror read speedup in zfsonlinux
Justin T. Gibbs
gibbs at FreeBSD.org
Mon Aug 19 22:23:07 UTC 2013
On Aug 19, 2013, at 6:04 AM, Steven Hartland <killing at multiplay.co.uk> wrote:
> Ok so latest version attached with the following changes:
> * Switched from d_nonrotating -> d_rotation_rate for more flexibility
> in the future.
> * Added g_handleattr_uint16_t to correctly handle uint16_t without
> hard coding size.
> * Switched back to for() {...} from do {..} while()
Thanks for doing this.
Can you also fix the comment above vdev_queue_length()? Perhaps you meant "concerned" instead of "precious"?
I also don't know if there is a policy about American vs. British english. e.g. "favour", vs "favor".
> * Added non_rotating_seek_inc option for controlling the seek increment
> for non rotating media.
When you tested on an all SSD complement, did you see any impact from this change? I would only expect it to do anything if the record size for the fs is small or compression is enabled because of the 128k aggregation limit in vdev_queue.c. Even in this case, only having a penalty of 1 means that that we'll still break up a stream of contiguous I/Os that could be aggregated. I think this can be revisited later though once vdev_queue.c is improved and we have more experimental results.
This version still has the problem of potentially setting lastoffset on the wrong device. It should be set on any leaf device the mirror code reads. Right now it only sets it on mm_preferred and the DTL checks may exclude that device before any I/O is issued.
I did some dtracing today and found that vdev_queue_io() is always called with vdev_mirror_io_start(). You mentioned having vdev_queue_io() record the offset vs. explicitly setting it in vdev_mirror.c degraded performance. Did you, by chance, have the recording code after vdev_queue_io()'s "if (zio->io_flags & ZIO_FLAG_DONT_QUEUE)" check? That's the only thing that comes to mind to explain this since there should be no appreciable delay. Having vdev_queue_io() do the recording would fix the lastoffset bugs in your current rev.
The "policy split" in both your version and the original has always troubled my software "aesthetic sensibilities". State like "vdev_readable()" gets checked twice, and more loops and code are required. Load calculations are made for children that may be excluded due to their DTL, etc. So I pushed all of the selection stuff into vdev_mirror_child_select(). The result is more correct since the preferred device is taken only from functional candidates. I think the result is simpler too, but I will let the list pass judgement.
The attached diff doesn't have your RPM clean up yet, but hopefully still gives a sense of how this might work.
--
Justin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zfs_mirror.diffs
Type: application/octet-stream
Size: 10162 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/zfs-devel/attachments/20130819/010be4dd/attachment.obj>
More information about the zfs-devel
mailing list