svn commit: r206497 - in head: sbin/geom/class sbin/geom/class/sched sys/geom/sched sys/modules/geom sys/modules/geom/geom_sched sys/modules/geom/geom_sched/gs_sched sys/modules/geom/geom_sched/gsc...

Fabio Checconi fabio at freebsd.org
Wed Apr 14 16:52:02 UTC 2010


> From: Luigi Rizzo <rizzo at iet.unipi.it>
> Date: Wed, Apr 14, 2010 10:16:30AM +0200
>
> [Cc-ing Fabio as he may have more details]
> 
...
> > BTW. So you decided to implement insert/remove functionality after all.
> > I have some questions:
> > 
> > - It is implemented as internal gsched hack, which is a pity, because
> >   this might be very useful functionality for other classes in the future.
> >   Is there a plan to make it more general and move it to the GEOM itself?
> 
> yes there is such a plan -- of course if nobody has objections.
> In principle it is only a library extensions with no modifications
> to geom internals.
> 
> However, when we developed that last year, we hit some corner case
> where removal of an active node causes either a race or (if you try
> to protect from the race) a livelock. Fixing this may require some
> small cleanup to geom internals (we discusses the issue with phk
> at last EuroBSDCon. Fabio may give you more details, as far as i
> remember the problem was that some geom code takes shortcuts instead
> of following a chain of pointers, and this can end up in the wrong
> place in case of a removal.)
> 
> For this reason, at this time i am not recommending to remove a
> node from a chain with outstanding transactions until the issue is solved.
> 

I'm not sure I remember all the details, the major issues were:

  - g_disk_done() dereferences bio->parent->bio_to->geom->softc, thus
    changing bio_to->geom on the fly was not possible with pending
    request (they would find a wrong softc when bubbling up).  For
    this reason we added a loop in g_insert_proxy() to wait for all
    the pending requests to be completed prior to inserting the proxy;
    but:

  - g_slice_finish_hot() completes requests in the event handling path,
    thus said loop (executed from an event handler) could result in a
    deadlock.  To avoid this (it should be far from being frequent,
    considering the usages of hotspots in the slice code) g_insert_proxy()
    fails if it takes too long to complete the old requests.

  - Some classes (from a quick look I've seen only g_mirror, but I'm
    pretty sure there were some other ones) cache pointers to their
    providers.  With these classes this implementation does not work.

    For the scheduler this is not a big issue, because its natural
    position is as close as possible to the disk device, but makes
    the mechanism quite hard to use in a more generic context.


> > - Why g_sched_flush_pending() operates on global structure? I think it
> >   will break if you try to insert and remove at the same time.
> 

If I'm not wrong, this should be safe because the global list is used only
under critical sections protected by the topology lock.


More information about the svn-src-all mailing list