N-way mirror read speedup in zfsonlinux
Steven Hartland
killing at multiplay.co.uk
Tue Aug 20 13:54:12 UTC 2013
----- Original Message -----
From: "Justin T. Gibbs" <gibbs at FreeBSD.org>
On Aug 19, 2013, at 6:04 AM, Steven Hartland <killing at multiplay.co.uk> wrote:
> > Ok so latest version attached with the following changes:
> > * Switched from d_nonrotating -> d_rotation_rate for more flexibility
> > in the future.
> > * Added g_handleattr_uint16_t to correctly handle uint16_t without
> > hard coding size.
> > * Switched back to for() {...} from do {..} while()
>
> Thanks for doing this.
>
> Can you also fix the comment above vdev_queue_length()? Perhaps you meant
> "concerned" instead of "precious"?
>
> I also don't know if there is a policy about American vs. British english.
> e.g. "favour", vs "favor".
Done and done ;-)
> > * Added non_rotating_seek_inc option for controlling the seek increment
> > for non rotating media.
>
> When you tested on an all SSD complement, did you see any impact from this change?
> I would only expect it to do anything if the record size for the fs is small or
> compression is enabled because of the 128k aggregation limit in vdev_queue.c.
> Even in this case, only having a penalty of 1 means that that we'll still break
> up a stream of contiguous I/Os that could be aggregated. I think this can be
> revisited later though once vdev_queue.c is improved and we have more experimental
> results.
Not managed to get this test done yet, will need to wait till I'm back in
the office now I'm afraid.
> This version still has the problem of potentially setting lastoffset on the wrong
> device. It should be set on any leaf device the mirror code reads. Right now it
> only sets it on mm_preferred and the DTL checks may exclude that device before any
> I/O is issued.
In theory with the move of the vdev_queue_register_lastoffset calls to
vdev_mirror_child_select, this should be mittergated some what if not
totally.
> I did some dtracing today and found that vdev_queue_io() is always called with
> vdev_mirror_io_start().
>
> You mentioned having vdev_queue_io() record the offset vs. explicitly setting it
> in vdev_mirror.c degraded performance. Did you, by chance, have the recording
> code after vdev_queue_io()'s "if (zio->io_flags & ZIO_FLAG_DONT_QUEUE)" check?
> That's the only thing that comes to mind to explain this since there should be
> no appreciable delay. Having vdev_queue_io() do the recording would fix the
> lastoffset bugs in your current rev.
Confirmed I just re-tested with :-
zio_t *
vdev_queue_io(zio_t *zio)
{
vdev_queue_t *vq = &zio->io_vd->vdev_queue;
zio_t *nio;
ASSERT(zio->io_type == ZIO_TYPE_READ || zio->io_type == ZIO_TYPE_WRITE);
if (zio->io_type == ZIO_TYPE_READ)
vq->vq_lastoffset = zio->io_offset + zio->io_size;
if (zio->io_flags & ZIO_FLAG_DONT_QUEUE)
return (zio);
...
With this the HDD transfer rates dropped from ~40MB/s to ~12MB/s
Also trued without zio->io_type == ZIO_TYPE_READ, as the offset should be
updated not matter what the type of operation, still no change.
> The "policy split" in both your version and the original has always troubled my
> software "aesthetic sensibilities". State like "vdev_readable()" gets checked
> twice, and more loops and code are required. Load calculations are made for
> children that may be excluded due to their DTL, etc. So I pushed all of the
> selection stuff into vdev_mirror_child_select(). The result is more correct
> since the preferred device is taken only from functional candidates.
Like it, I must say it bugged me too the vdev_mirror_map_alloc did part of
vdev_mirror_child_select.
> I think the result is simpler too, but I will let the list pass judgement.
>
> The attached diff doesn't have your RPM clean up yet, but hopefully still gives
> a sense of how this might work.
I've updated my version with your changes along with extracting our
the mm structure to a simple inline used in both routes.
It would be good to understand why we cant set the offset in vdev_queue_io as
having call vdev_queue_register_lastoffset multiple times is a bit messy.
FYI: I'm away for a week now so may not be able to read emails as much as usual
so if there's a delay replying thats why.
Regards
Steve
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zfs-mirror-load.patch
Type: application/octet-stream
Size: 16485 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/zfs-devel/attachments/20130820/b93a72ba/attachment.obj>
More information about the zfs-devel
mailing list