cvs commit: src/sys/dev/fxp if_fxp.c if_fxpvar.h

Tue Apr 29 13:01:45 PDT 2003

John Baldwin writes:
 > 
 > On 29-Apr-2003 Andrew Gallatin wrote:
 > > 
 > > John Baldwin writes:
 > > 
 > >  > 
 > >  > First off, it's a gross API violation.  ithread's are supposed to be
 > >  > transparent (except that you can use locks in your interrupt handlers).
 > > 
 > > The API seems designed for inefficiency, so it needs violating ;)
 > > 
 > >  > Secondly, you still have the same race as if you just removed the gone
 > >  > condition.  We don't hold the ithread lock while executing handlers,
 > >  > so there is nothing preventing the handler from being executed on another
 > >  > CPU concurrently with your detach function.  In fact, it could easily
 > >  > be blocked on the FXP lock.  You do your magic pointer foo, then unlock
 > >  > the fxp.  The unlock releases the interrupt handler on another CPU
 > >  > which happily executes _after_ the completion of bus_teardown_intr()
 > > 
 > > I assumed dropping the fxp lock would be enough to encourage any
 > > pending handlers to finish.  Would simply tsleep()'ing for a second
 > > work?
 > 
 > It would encourage them, but they can easily be executed in parallel
 > on another CPU with the rest of the detach function.  Throwing in random
 > sleeps isn't going to address that since the other CPU could always get
 > another interrupt while this one sleeps.
 > 
 > > If needed we should violate the API even more to check to see if any
 > > handlers are pending.  Anything is better than adding instructions to
 > > the critical path.
 > 
 > Where would you do this check for more pending handlers?  In detach?
 > What does detach do when it's true?  The problem is that detach may

It sleeps.

 > be too late.  Take this sequence:
 > 
 > 
 >         CPU 0           CPU 1
 >         --------        --------
 >         foo_detach()    ...
 >         FOO_LOCK()      interrupt for foo
 >         ...             foo_intr()
 >         ...             FOO_LOCK() and blocks

You forgot:
	   prevent_device_from_generating_interrupts();
	   FOO_UNLOCK();

 >         bus_teardown_intr()
 > 
 > Hmm.  Now we have a nice deadlock unless we take very drastic measures
 > like trying to kill threads out from under ourselves and leaving held
 > locks, etc. in their wake.  Also, warner's check for gone outside of
 > the fxp lock is not safe, since it can still lose a race where the gone
 > bit gets set after the check but before foo_intr() gets the lock.
 > 
 > Now, we have to synchronize between foo_intr() and foo_detach() somehow.
 > We need some way in foo_intr() to know to do that.  That's going to
 > involve some kind of check.  Kernel panics are not an acceptable

The whole idea is to avoid any check in the critical path.   Detach
can be as slow as it needs to be, it doesn't happen often.

 > alternative.  So, how do you propose to solve the above sequence w/o
 > using a flag check in foo_intr() for synchronization?  I'm not really

In foo_detach you disable interrupts on the board, and you replace its
handler with an orphan handler for the benefit of shared irqs.  At the
point there can be exactly one interrupt pending which may be blocked
anywere, possibly on the foo lock.  You then drop the foo lock.

You then grab the required locks, and see if foo's ithread is running,
or runnable.  If it is, you drop the locks, sleep for hz and repeat
the check.   If its not, you do the guts of ithread remove handler.

 > prepared to try and kill a thread executing on another CPU as that is
 > 1) hard and 2) error prone (we just dinked with registers in detach to
 > turn things like interrupts off, allowing the interrupt routine to
 > execute can result in things like interrupts getting re-enabled, which
 > would be quite bad, also, if we wanted to do that we would have to
 > unwind all the locks held by the ithread and just _hope_ that it doesn't
 > leave anything in an inconsistent state to do so, along with adding
 > WITNESS-like pessimizations to the _default_ case for locking to maintain
 > enough state to allow for such an unwind).
 > 
 > > We really need to think about efficiency.  Our 5.x performance sucks.
 > > Really sucks.  We're being nickled and dimed to death by extra
 > > instructions here, there, and everywhere.
 > 
 > Unfortunately 5.x attempts to run with a thread-safe kernel, and that
 > involves extra overhead to work around races that 4.x didn't even have
 > to dream about.  In theory the increased performance should come from
 > increased parallelism at the cost of increased overhead.  If FreeBSD

No, in theory increased performance should come from increased
parallelism with no increased overhead.  Any increased overhead is a
bug.  Linux 2.4 runs a thread safe kernel with less overhead than we
have in 4.x.  Its possible.

 > decides that a thread-safe kernel isn't worth it then I guess we can
 > pitch 5.x and start over.

As we get closer to a stable branchpoing, and continue to suck, I'm
starting to think we should start over.

Drew