svn commit: r206497 - in head: sbin/geom/class
sbin/geom/class/sched sys/geom/sched sys/modules/geom
fabio at freebsd.org
Wed Apr 14 16:52:02 UTC 2010
> From: Luigi Rizzo <rizzo at iet.unipi.it>
> Date: Wed, Apr 14, 2010 10:16:30AM +0200
> [Cc-ing Fabio as he may have more details]
> > BTW. So you decided to implement insert/remove functionality after all.
> > I have some questions:
> > - It is implemented as internal gsched hack, which is a pity, because
> > this might be very useful functionality for other classes in the future.
> > Is there a plan to make it more general and move it to the GEOM itself?
> yes there is such a plan -- of course if nobody has objections.
> In principle it is only a library extensions with no modifications
> to geom internals.
> However, when we developed that last year, we hit some corner case
> where removal of an active node causes either a race or (if you try
> to protect from the race) a livelock. Fixing this may require some
> small cleanup to geom internals (we discusses the issue with phk
> at last EuroBSDCon. Fabio may give you more details, as far as i
> remember the problem was that some geom code takes shortcuts instead
> of following a chain of pointers, and this can end up in the wrong
> place in case of a removal.)
> For this reason, at this time i am not recommending to remove a
> node from a chain with outstanding transactions until the issue is solved.
I'm not sure I remember all the details, the major issues were:
- g_disk_done() dereferences bio->parent->bio_to->geom->softc, thus
changing bio_to->geom on the fly was not possible with pending
request (they would find a wrong softc when bubbling up). For
this reason we added a loop in g_insert_proxy() to wait for all
the pending requests to be completed prior to inserting the proxy;
- g_slice_finish_hot() completes requests in the event handling path,
thus said loop (executed from an event handler) could result in a
deadlock. To avoid this (it should be far from being frequent,
considering the usages of hotspots in the slice code) g_insert_proxy()
fails if it takes too long to complete the old requests.
- Some classes (from a quick look I've seen only g_mirror, but I'm
pretty sure there were some other ones) cache pointers to their
providers. With these classes this implementation does not work.
For the scheduler this is not a big issue, because its natural
position is as close as possible to the disk device, but makes
the mechanism quite hard to use in a more generic context.
> > - Why g_sched_flush_pending() operates on global structure? I think it
> > will break if you try to insert and remove at the same time.
If I'm not wrong, this should be safe because the global list is used only
under critical sections protected by the topology lock.
More information about the svn-src-head