Pluggable Disk Scheduler Project

Thu Oct 11 02:31:51 PDT 2007

On tor, okt 11, 2007 at 04:20:01 +0200, Fabio Checconi wrote:
> Hi,
>     is anybody working on the `Pluggable Disk Scheduler Project' from
> the ideas page?
Hi,
I did look into it :) But then other projects came.
> 
> To better understand how GEOM works, and how a (non work conserving)
> disk scheduler can fit into it, I've written a very simple, yet
> working, prototype:
> 
>     http://feanor.sssup.it/~fabio/freebsd/g_sched/geom-sched-class.patch
> 
> 
> I'd like to take a better look at the problem and work on it, and
> I'd like to know what you think about it.
> 
> After reading [1], [2] and its follow-ups the main problems that
> need to be addressed seem to be:
> 
>     o is working on disk scheduling worth at all?
It is hard to say, but I'd like to run some benchmarks with this to see.
Also, noted in [2], newer hardware does more magic on their own, as well as
solid state drives coming along.

>     o Where is the right place (in GEOM) for a disk scheduler?
As discussed in [2], some suggested that disk scheduling should be done on a
lower part of a kernel due to knowledge of hardware capabilities.

As discussed in [1], ata for instance do it's own scheduling, so this might
ruin performance (Even the hardware might do some magic of it's own). I
think I tried disabling it though, so shouldn't be a big deal for testing.

>     o How can anticipation be introduced into the GEOM framework?
This is actually perhaps one of the most interesting points, since the
anticipation principle in itself fits here, but some other scheduling
features might not be useful.

>     o What can be an interface for disk schedulers?
I think the interface developed in [1] is a pretty good one actually. I think
the disksort-routines looked as a good place to do this. Even there it might
not know enough about the hardware.

>     o How to deal with devices that handle multiple request per time?
This is an example of the problems you get doing this in GEOM. You don't have
very good knowledge of the hardware.

>     o How to deal with metadata requests and other VFS issues?
> 
> I think that some answers need a little bit of experimenting with
> real code and real hardware, so here it is this attempt.  The
> interface used in this toy prototype for the scheduler is something
> like that:
> 
>     typedef void *gs_init_t (struct g_geom *geom);
>     typedef void gs_fini_t (void *data);
>     typedef void gs_start_t (void *data, struct bio *bio);
>     typedef void gs_done_t (void *data, struct bio *bio);
> 
>     struct g_gsched {
> 	    const char	*gs_name;	/* Scheduler name. */
> 	    int		gs_refs;	/* Refcount, internal use. */
> 
> 	    gs_init_t	*gs_init;	/* Called on geom creation. */
> 	    gs_fini_t	*gs_fini;	/* Called on geom destruction. */
> 	    gs_start_t	*gs_start;	/* Called on geom start. */
> 	    gs_done_t	*gs_done;	/* Called on geom done. */
> 
> 	    LIST_ENTRY(g_gsched) glist;	/* List of schedulers, internal use. */
>     };
> 
> The main idea is to allow the scheduler to enqueue the requests having only
> one (other small fixed numbers can be better on some hardware) outstanding
> request and to pass new requests to its provider only after the service of
> the previous one ended.
> 
> The example scheduler in the draft takes the following approach:
> 
>     o a scheduling GEOM class is introduced.  It can be stacked on
>       top of disk geoms, and schedules all the requests coming
>       from its consumers.  I'm not absolutely sure that a new class
>       is really needed but I think that it can simplify testing and
>       experimenting with various solutions on the scheduler placement.
>     o  Requests coming from consumers are passed down immediately
>       if there is no other request under service, otherwise they
>       are queued in a bioq.
>     o  When a request is served the scheduler is notified, then it
>       can pass down a new request, or, as in this toy anticipatory[3]
>       scheduler, wait for a new request from the same process, or
>       for a timeout to expire, and only after one of those events
>       make the next scheduling decision.
> 
> So, as I've said, I'd like to know what you think about the subject,
> if I'm missing something, if there is some kind of interest on this
> and if/how can this work proceed.

Also, what would be interesting is implementing I/O priorities for processes
to be able to give I/O time more fairly(or at least being able to set after
preference) to processes. This was done in the Hybrid project, but this is
something that definately could be done in GEOM. (I see you have some
fairness in the g_as_dispatch routine though). 

However, I'll try testing the work you've got. I'll see if I can get some
numbers with this when I get some disks up.

Btw, I did run some benchmark when I tried chaning bioq_disksort into a FIFO
queue which didn's seem to lower performance (on SCSI and UMASS, but need to
test again with ATA). It was a long time ago, so it should be tried again
though.
> 
> [1]  http://wiki.freebsd.org/Hybrid
> 
> [2]  http://lists.freebsd.org/pipermail/freebsd-geom/2007-January/001854.html
> 
> [3]  The details of the anticipation are really not interesting as it
>     is extremely simplified by purpose.
> 
> [4]  http://feanor.sssup.it/~fabio/freebsd/g_sched/ contains also an userspace
>     client to experiment with the GEOM class.
> 
-- 
Ulf Lilleengen