skipping locks, mutex_owned, usb

Tue Aug 23 16:42:21 UTC 2011

On Tue, Aug 23, 2011 at 6:33 AM, Andriy Gapon <avg at freebsd.org> wrote:
> on 23/08/2011 15:58 mdf at FreeBSD.org said the following:
>> On Tue, Aug 23, 2011 at 5:09 AM, Andriy Gapon <avg at freebsd.org> wrote:
>>> III.  With my stop_scheduler_on_panic patch ukbd_poll() produces infinite chains
>>> of 'infinite' recursion because mtx_owned() always returns false.  This is because
>>> I skip all lock/unlock/etc operations in the post-panic context.  I think that
>>> it's a good philosophical question: what operations like mtx_owned(),
>>> mtx_recursed(), mtx_trylock() 'should' return when we actually act as if no locks
>>> exist at all?
>>>
>>> My first knee-jerk reaction was to change mtx_owned() to return true in this
>>> "lock-less" context, but _hypothetically_ there could exist some symmetric code
>>> that does something like:
>>> func()
>>> {
>>>        if (mtx_owned(&Giant)) {
>>>                mtx_unlock(&Giant);
>>>                func();
>>>                mtx_lock(&Giant);
>>>                return;
>>>        }
>>>
>>>        // etc ...
>>>
>>> What do you think about this problem?
>>
>> Given that checking for a lock being held is a lot more common than
>> checking if it's not held (in the context of mtx_assert(9) and
>> friends), it seems reasonable to report that a lock is held in the
>> special context of after panic.
>
> But, OTOH, there are snippets like this (found with grep, haven't looked at the code):
> /usr/src/sys/kern/kern_sx.c:            while (mtx_owned(&Giant)) {
>
>>> That question III actually brings another thought: perhaps the whole of idea of
>>> skipping locks in a certain context is not a correct direction?
>>> Perhaps, instead we should identify all the code that could be legitimately
>>> executed in the after-panic and/or kdb contexts and make that could explicitly
>>> aware of its execution context.  That is, instead of trying to make _lock,
>>> _unlock, _owned, _trylock, etc do the right thing auto-magically, we should try to
>>> make the calling code check panicstr and kdb_active and do the right thing on that
>>> level (which would include not invoking those lock-related operations or other
>>> inappropriate operations).
>>
>> I don't think it's possible to identify all the code, since what runs
>> after panic isn't tested very much. :-)  I don't disagree that one
>> should think about the issue, but providing reasonable defaults like
>> skipping locks reduces the burden on the programmer.
>
> Yes, I agree with your and John's practical approach to this.
> Maybe print a warning about each skipped locking operation?  But not sure if that
> would be useful, if there would be no intention of changing the code.

Skipped locking seems like it should be left silent.  I think this is
a reasonable behaviour used on many operating systems -- at least I
know it is used on AIX.

I like the idea of a printf() in mtx_owned() since there is no
completely correct answer there.  Then at least on e.g. an infinite
loop like you point out, it would be clear what's happening (and
presumably this could use the __FILE__ and __LINE__ of the caller in
the print).

Cheers,
matthew