RFC: Fixing USB ethernet for FreeBSD 7.0.

Fri Dec 1 11:17:32 PST 2006

[I am not subscribed to these lists, please do not trim me off the
 cc list.]

This email is has a short war story and a request for comments.

I recently had the displeasure of trying to use an USB etherdongle
under FreeBSD.  Result: panic when the interface was started.

I fixed it using a stopgap:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/usb/if_aue.c?rev=1.101&content-type=text/x-cvsweb-markup

There are still some major issues:
1) requires Giant.
2) several error paths will still panic the kernel.

I would like to fix them, however that does not seem easy given the
existing infrastructure.

I'm going to walk you through my thought process on the whole system
and I would like feedback please.

  Basically, consider this a one-way conversation that you may
  interrupt at any time to correct me or to make suggestions.  If
  there is a major flaw in anything I've said, then feel free to
  discard the following points and we will start up from the flaw.

  NOTE:  I am interested in minimizing the impact of handling a
  "slow bus" on the rest of the kernel, but at the same time I am
  not interested in saving nanoseconds off IO time at the expense
  of having a programmatically dismal API.  Meaning, I will be happy
  to sacrifice a modicum of performance in order to provide a
  programming model that allows us to have a stable device driver
  base.

Onto the RFC:

**********************************************************************

RFC: Fixing USB ethernet API:

Statement #1:
USB is a slow bus.  While doing IO to the usb bus one should be
able to yeild the CPU to another thread.  Hence an interrupt can
not use DELAY() to wait for data, instead it should arrainge for
a callback upon completion of IO, and userspace should use tsleep
to wait for data.

Statement #2:
Using callbacks to do all IO during an interrupt is programatically
complex and painful.  For instance, take the case of the following
code pulled from aue_stop() (which can be called from interrupt
context):

	AUE_LOCK(sc);
...
	aue_csr_write_1(sc, AUE_CTL0, 0);
	aue_csr_write_1(sc, AUE_CTL1, 0);
	aue_reset(sc);
...
	/* Stop transfers. */
	if (sc->aue_ep[AUE_ENDPT_RX] != NULL) {
		err = usbd_abort_pipe(sc->aue_ep[AUE_ENDPT_RX]);
		if (err) {
			printf("aue%d: abort rx pipe failed: %s\n",
		    	sc->aue_unit, usbd_errstr(err));
		}
		err = usbd_close_pipe(sc->aue_ep[AUE_ENDPT_RX]);

There are probably about a dozen IOs here, splitting this into
callbacks would be terribly inconvenient.

Note the aue_reset() call which does several syncronous IOs
that should not be interrupted!

Statement #3:
We need to provide the same atomic guarantees that we give other
device drivers, specifically the ability to do FOO_LOCK/FOO_UNLOCK
and not have to worry about user contexts sneaking into interrupt
contexts.  (we do not have these problems on "fast bus" devices
because of per-driver locks (FXP_LOCK/FXP_UNLOCK))

Statement #4:
To do all of this in a manner that provides a programmitically safe
way we need to run some drivers under a full kthread process context
even under interrupts.

Statement #5:
We have only a minor mechanism to do so at the current time, only
the usb_taskqs exist.  Using just the usb_taskqs would serialize
IO too much and slow down USB IO, additionally if any device or
driver wedges, the whole stack will stop working.

Proposal:

Each USB device that needs this (I envision most devices moving to this
model) will require the following:

A process context.
A process style recursive lock (lockmgr).

I am leaning towards 1 thread per device instance, the reason being
that if a device (or its driver) goes out to lunch, it should not bring
down the whole stack.

If anyone has the "thousands of usb devices" then they can invent some
sort of "usb taskq pool" to make life easier.

What I will provide is:

API for creating a per-device kthread.
API for deleteing per-device kthread.
API for recursive process locks.  (simple layer over lockmgr)

What I would like from FreeBSD is a discussion about if there could be
a "better way" that _IS PROGRAMMITICALLY FEASABLE_.

Sounds good?  Let me know, now is the time to discuss this with me
before I waste a lot of time writing something that someone may have
to rewrite in a few months because they didn't speak up now.

I'm available for phone calls if that will help.

thank you,
-- 
- Alfred Perlstein, RED Incorporated Consulting.
- coder / sysadmin / FreeBSD Hacker / All that jazz -