svn commit: r206497 - in head: sbin/geom/class
sbin/geom/class/sched sys/geom/sched sys/modules/geom
sys/modules/geom/geom_sched sys/modules/geom/geom_sched/gs_sched
sys/modules/geom/geom_sched/gsc...
Fabio Checconi
fabio at freebsd.org
Wed Apr 14 16:52:02 UTC 2010
> From: Luigi Rizzo <rizzo at iet.unipi.it>
> Date: Wed, Apr 14, 2010 10:16:30AM +0200
>
> [Cc-ing Fabio as he may have more details]
>
...
> > BTW. So you decided to implement insert/remove functionality after all.
> > I have some questions:
> >
> > - It is implemented as internal gsched hack, which is a pity, because
> > this might be very useful functionality for other classes in the future.
> > Is there a plan to make it more general and move it to the GEOM itself?
>
> yes there is such a plan -- of course if nobody has objections.
> In principle it is only a library extensions with no modifications
> to geom internals.
>
> However, when we developed that last year, we hit some corner case
> where removal of an active node causes either a race or (if you try
> to protect from the race) a livelock. Fixing this may require some
> small cleanup to geom internals (we discusses the issue with phk
> at last EuroBSDCon. Fabio may give you more details, as far as i
> remember the problem was that some geom code takes shortcuts instead
> of following a chain of pointers, and this can end up in the wrong
> place in case of a removal.)
>
> For this reason, at this time i am not recommending to remove a
> node from a chain with outstanding transactions until the issue is solved.
>
I'm not sure I remember all the details, the major issues were:
- g_disk_done() dereferences bio->parent->bio_to->geom->softc, thus
changing bio_to->geom on the fly was not possible with pending
request (they would find a wrong softc when bubbling up). For
this reason we added a loop in g_insert_proxy() to wait for all
the pending requests to be completed prior to inserting the proxy;
but:
- g_slice_finish_hot() completes requests in the event handling path,
thus said loop (executed from an event handler) could result in a
deadlock. To avoid this (it should be far from being frequent,
considering the usages of hotspots in the slice code) g_insert_proxy()
fails if it takes too long to complete the old requests.
- Some classes (from a quick look I've seen only g_mirror, but I'm
pretty sure there were some other ones) cache pointers to their
providers. With these classes this implementation does not work.
For the scheduler this is not a big issue, because its natural
position is as close as possible to the disk device, but makes
the mechanism quite hard to use in a more generic context.
> > - Why g_sched_flush_pending() operates on global structure? I think it
> > will break if you try to insert and remove at the same time.
>
If I'm not wrong, this should be safe because the global list is used only
under critical sections protected by the topology lock.
More information about the svn-src-all
mailing list