panic: mutex Giant not owned at /usr/src/sys/kern/tty_ttydisc.c:1127

Sat Jan 24 10:43:13 PST 2009

On Sat, Jan 24, 2009 at 1:55 AM, Hans Petter Selasky <hselasky at c2i.net> wrote:
> On Saturday 24 January 2009, Maksim Yevmenkin wrote:
>> Hans Petter,
>>
>> i'm sorry, did i mention there is no sleeping in netgraph? :-)
>
> Can you elaborate this? Is netgraph run from fast interrupt context? So that
> only spin locks are possible? Or is it run from a thread?

i already told you that netgraph is essentially single-threaded. for
all intents and purposes think that only _ONE_ thread services _ALL_
netgraph nodes. if something can not be done immediately,  netgraph
queues it and returns back to it later. _ANY_ delay/sleep would stall
_ENTIRE_ netgraph. think poll/select/callback driven programming
model.

>> from what i can see you are _NOT_ using _SPIN_ mutexes (aka spin
>> locks). you are using regular mutexes. let me quote locking(9) man
>> page
>>
>> "
>> Mutexes
>>      Basically (regular) mutexes will deschedule the thread if the mutex
>> can not be acquired.  A non-spin mutex can be considered to be equivalent
>> to getting a write lock on an rw_lock (see below),
>> "
>>
>> so, if thread can not get mutex it will be descheduled. this
>> absolutely can not happen in netgraph!
>
> There are mutexes inside the taskqueue aswell. The problem will be the same
> there if you don't use a so-called fast tasqueue.

well, yes. technically, its not 100% correct, but it has to be like it
because there is simply no way around this. if you read the blob at
the top of the ng_ubt2.c you know that it uses regular sc_mbufq_mtx
(which can sleep). let me quote the blob here again

" Access to the outgoing queues and task flags is controlled by the
  sc_mbufq_mtx lock. It is an unavoidable evil. Again, sc_mbufq_mtx should
  really be a spin lock."

the sc_mbufq_mtx is _NOT_ marked as SPIN mutex only because WITNESS
has to know about spin locks. it is generally NOT a good idea to add a
spin lock when someone feels like it.

now back to taskqueue_enqueue(), yes, taskqueue_enqueue_fast() is more
appropriate here. however, taskqueue_enqueue() was chosen because
taskqueue_enqueue() does not do anything that could cause sleep in
TQ_LOCK()/TQ_UNLOCK() (its just basically a lookup).

from taskqueue man page

"Enqueueing a task does not perform any memory allocation which makes
it suitable for calling from an interrupt handler."

USB_BUS_LOCK()/USB_BUS_UNLOCK() are different is this regard because
those locks used when usb2 does things with hardware. also, like i
said, there could be few usb devices sharing the same bus and thus the
same USB_BUS_LOCK.

once again the rule is: NO SLEEPING/STALLING IN NETGRAPH. it DOES NOT
mean that you can not use mutexes. it only means that you have to be
very careful of how the mutex is used.

>> first of all, i do not think crashes are caused by detach(). in fact,
>> detach() is clean. i've tested it and it worked for me. i tried to
>> start/stop device while doing flood l2ping. i also tried to yank the
>> device while doing flood l2ping. it works correctly. i think, the
>> issue is related to stalled transfers. there is still something wrong
>> with the code path that deals with stalled transfers. stalls do not
>> happen on my test box, so i can not test it. also there is NO code
>> duplication. asynchronous path is required to decouple netgraph from
>> usb2.
>
> Only if netgraph is run from fast interrupt context.

no, no, no. we are going back in circles. i would very much like to do
things synchronously, but i can not. imo, i've given you plenty
reasoning behind async design. please understand, asyn design has to
stay. even if "sync" code appears to work, its still wrong.

>> regular mutexes can sleep. we are not allowed to sleep in netgraph.
>> therefor we must transition out of netgraph context before calling
>> into any code that tries to grab regular mutex. the async design is
>> there not because i want to make things complicated. its there because
>> it is needed.
>
> I think there are two definitions of sleeping.
>
> 1) When a thread is waiting for a mutex it is not sleeping in the same way
> like if it was to call "tsleep()".
>
> 2) When a thread is waiting for a wakeup it is surely sleeping, which can
> happen inside sx_lock() and tsleep().

when i say "sleeping" i mean that the thread that is trying to get the
lock is de-scheduled. that is, it is not running anymore. i'm not
talking about [tm]sleep() etc. just the fact that thread is not
running.

oh, and to possibly answer your next question, freebsd's mutexes are
ADAPTIVE, meaning that thread will will spin on a mutex for a little
bit if it can not grab it right away BEFORE going to sleep. this means
that the sc_mbufq_mtx mutex is very likely to be an equivalent of a
spin lock because it is mosty used to protect queue while en/dequeuing
mbuf etc. the same COULD apply for TQ_LOCK in taskqueue_enqeue().

thanks,
max