panic: mutex Giant not owned at /usr/src/sys/kern/tty_ttydisc.c:1127

Fri Jan 23 13:45:35 PST 2009

Hans Petter,

[...]

>> > First of all I've made some patches which try to tidy up the USB blutooth
>> > driver. Please see:
>> >
>> > http://perforce.freebsd.org/chv.cgi?CH=156577
>> > http://perforce.freebsd.org/chv.cgi?CH=156579
>>
>> thanks! unfortunately, i have few problems with those patches. please
>> read below.
>>
>> >> that should not be needed (in theory) but i will add it.
>> >
>> > Yes! We are multithreaded!
>>
>> please tell me that you read the blob about locking strategy at the
>> top of ng_ubt2.c file. when ubt_task() is pending, it holds a
>> reference to the netgraph node. that means the pointer is still
>> pointing to a valid structure, but the node is marked as "dead" and
>> NG_NODE[_NOT]_VALID() macros can be use to check for that. so, even if
>> task is still pending while the rest of ng_ubt2 stuff is gone, it does
>> not matter because task holds netgraph node. so the next time task is
>> executed it will see that the node is "dead", release the node and
>> simply return doing nothing.
>
> This kind of complexity is unneccesary. We should be able to drain a node, or
> sleep until a node is released.
>
> 0) stop all USB activity
> 1) free node
> 2) wait until the node is actually freed
> 3) free USB resources which might access node fields
>
> Conclusion:
>
> No checks are required inside any of the USB callbacks.

i've added taskqueue_drain() to detach() in my "stall" diff that i
posted few hours back. is that addresses some of your concerns? also
keep in mind that while netgraph node is created and destroyed
primarily by the driver in response to hardware arrival and departure,
nothing protects it from the netgraph side. in other words there is
another way to manipulate netgraph node using netgraph api. for
example user can request shutdown of a netgraph node, etc. so all of
this has to be taking into consideration.

in fact, if i understand you correctly, with addition of
taskqueue_drain() to detach(), the later becomes completely
synchronous. that is

1) NG_NODE_REF()
2) ng_rmnode_self (i.e. mark node as possibly dead, detach all the
hooks, etc. etc.)
3) taskqueue_drain() (i.e. sleep until task has finished)
4) usb2_transfer_unsetup() (which does "usb2_transfer_drain() and sleeps)
5) NG_NODE_UNREF()
6) free everything up

in theory, items (1) and (5) above just an additional protection
because i was not sure about inner workings of usb2. i think, when we
sort everything out they can be removed.

>> >> > 2) You drain all USB transfers: "usb2_transfer_drain()" called
>> >> > unlocked.
>> >>
>> >> yes, usb2_transfer_unsetup() does that, does it not?
>> >
>> > That is correct, but sometimes you need to stop the transfers at an
>> > earlier point. It depends on your design.

right

>> when detach() is called its already out of driver's hands. all it can
>> do now is to call unsetup(), or am i missing something here?
>
> The correct thing is to do a usb2_transfer_drain() which will wait until the
> callback has been called back, if a transfer is pending.

i do not understand. if usb2_transfer_unsetup() calls
usb2_transfer_drain() and sleeps, then what difference does it make in
detach()? i just followed original code that did it this way, i.e.
drain and unsetup all transfers in one call.

> In USB2 a "usb2_start_hardware()" call is always paired with a USB transfer
> callback, no matter what you do! This way of design has been chosen so that
> you can do refcounts inside the callback!

ok, and that is exactly what code does. the only thing is that
refcounts are on netgraph node and not on softc structure. again the
reason for that is because netgraph node can be accessed from both
usb2 driver and netgraph subsystem directly.

> > I see in your code that you try to do things Asynchronously. This
>> > complicates stuff quite a lot. In my patches I've tried to do some things
>> > synchronously.
>>
>> no, No, NO (sorry for shouting). we _CAN_NOT_ sleep in netgraph.
>> period. you _CAN_NOT_ grab usb interface mutexes unless you can
>> absolutely guarantee that they will not sleep for any significant
>> amount of time.
>
> usb2_transfer_start() and usb2_transfer_stop() are non-blocking and will never
> sleep. The exception is "usb2_transfer_drain()" which can sleep for a few
> milliseconds and is only called when the hook is disconnected. I do not see
> that as a problem versus having "X" more checks in the code.

well, when i looked at the code, i saw that both functions do
USB_BUS_LOCK() and USB_BUS_UNLOCK(). i do not know enough about those
locks and usb2 in general, so i decided to err on the side of caution
and move all the operations that could potentially sleep into the
taskqueue context. this actually completely decouples netgraph from
usb2, and that is a "good thing (tm)" imo :) a little bit of
complexity is totally justified here imo.

also, now that you mentioned it, i should call usb2_transfer_drain()
in ubt_task() instead of usb2_transfer_stop() to make transfer stop
completely synchronous as well.

please let me reiterate, sleeping in netgraph is NOT allowed. even if
its for a few milliseconds. please accept it as a fact. think of all
netgraph methods as fast interrupt handlers. it is very typical for
fast interrupt handlers to do minimal amount of work and schedule swi
to do more heavy processing. that is exactly what the code does here.

>> netgraph is essentially single threaded and by
>> sleeping you will stall entire netgraph subsystem and not just current
>> thread.
>
> If we sleep 10 milliseconds on a UBT hook disconnect, is that a big problem?
> How often will UBT hook disconnect be called?

it does not matter how often it will or will not be. please read my
comments about. once again, no sleeping in netgraph :)

>> so, objection #1 (the biggest one) is: please leave sc_mbufq_mtx and
>> ubt_task() glue in place. it is needed as far as i can tell.
>
> What I see is that USB already has a separate process handling the callback so
> that when you call "usb2_transfer_start()" this other process is woken up.
> Are you sure that there is a measurable gain using "ubt_task()" versus the
> way I'm doing it?

yes, i'm sure. i did not invent anything new here. it all follows
typical fast interrupt handler/swi (upper/bottom half for linux foks
:) model. there is really nothing complicated about it, except "scary"
calls to NG_NODE_REF/UNREF and NG_NODE[_NOT]_VALID() :) i'm quite
happy to make an additional cleanups that you would require, but
asynchronous design is something that need to stay in place, imo.

>> objection #2 (somewhat major). please do NOT remove NG_NODE_REF() call
>> in detach() before calling ng_rmnode_self() (unless you remove
>> NG_NODE_UNREF() below as well). again the reason here is to tell
>> netgraph node that we are dying, but make sure we keep node_p pointer
>> pointing to a valid structure as long as possible. to summarize,
>
> We are already holding a reference to the node from ubt_attach(), right? Why
> do we need another node reference?

there is no NG_NODE_REF in attach(). when we create node is comes with
one reference and that reference is removed by ng_rmnode_self().
please read my comments about detach() above. hopefully it will make
sense now.

[...]

>> more of a question than objection. why you insist on having single
>> mutex for both interfaces? isoc transfers callbacks will be called
>> much more often that control, bulk and interrupt callbacks, so
>> wouldn't it make sense to use different lock for isoc transfers?
>
> Because a single mutex is much easier to maintain and I see no gain fine
> graining more for a bluetooth device :-)

i do not really have strong opinion about it. i just thought there
would be less contention between isoc and bulk/intr/ctrl transfers
when they use different locks. probably does not matter, since
everything is going over the same bus. i'm fine with this change utill
someone has credible evidence otherwise.

>> changes in attach() and usb2_config structures are fine (i guess). but
>> please keep asynchronous design in place. it is required.
>
> Is it possible we can make a compromise here? From my experience synchronous
> code is smaller and easier to understand than asynchronous code! The only
> disadvantage about synchronous code is that more threads might be required.
> And hence Netgraph is single threaded other signalling might be delayed, but
> is that a real problem if we are talking about delays around 1x10ms ever time
> you disconnect the hook of an UBT node?

like i said, i'm quite happy to compromise here. i will make any
change that would help to understand/maintain code better.
unfortunately, asynchronous design would have to stay in one form or
another. netgraph has to be completely decoupled from any other
subsystem, imo. it is just what it is.

not really related to this conversation, but think about how many
changes went into freebsd between version 6 and 8, and, yet, netgraph
(including bluetooth) code changed very little in comparison. so
keeping things "separate" is a "good thing(tm)" :)

thanks,
max