svn commit: r238907 - projects/calloutng/sys/kern

Mon Oct 29 18:15:41 UTC 2012

On Mon, 29 Oct 2012, Attilio Rao wrote:

> On 10/29/12, Bruce Evans <brde at optusnet.com.au> wrote:
>> On Mon, 29 Oct 2012, Attilio Rao wrote:
>>
>>> On 10/29/12, Bruce Evans <brde at optusnet.com.au> wrote:
>>>> On Mon, 29 Oct 2012, Attilio Rao wrote:
>>>>
>>>>> Now that sched_pin()/sched_unpin() are fixed I would like to introduce
>>>>> this new patch, making critical_enter()/critical_exit() inline:
>>>>> http://www.freebsd.org/~attilio/inline_critical.patch
>>>>>
>>>>> The concept is pretty simple: simple add/dec for critical_enter, exit
>>>>> are inlined, the rest is in an "hard path". Debugging enables the hard
>>>>> paths by default (really I think that only KTR may be due here, but I
>>>>> thought that in case of INVARIANTS this was also wanted, so I added
>>>>> the check also for that case).
>>>> ...
>>>> Inlining of mtx_lock_spin() is bogus unless critical_enter() is inlined.
>>>> Similarly for mtx_unlock_spin() and critical_exit().  It saves 1
>>>> function
>>>> call. but critical_enter() does a function call anyway.  critical_exit*(
>>>> also has a branch in branch in it that might cost more than the function
>>>> call just for mispredicting it.
>>>
>>> Correct, that is a further argument for having inlined
>>> critical_enter(),
>>
>> And for inlining neither, ot the opposite one like I do.
>>
>>> even if the actual calling cames from
>>> spinlock_enter(), not critical_enter(), which must be MD (that's on
>>> FreeBSD, not sure what happens in your OS).
>>
>> I forgot that I don't have the slow functions spinlock_enter() and
>> spinlock_exit() in mtx_[un]lock_spin().  (My mutexes don't block
>> interrupts, as required for fast interrupt handling that is actually
>> fast (really low-latency).  My spinlocks just use critical*(), and
>> critical*() doesn't block fast interrupt handling.)
>>
>> The spinlock_enter() calls mean that inlining mutex calls is even more
>> bogus.  Instead of just 1 function call which does not much more than
>> increment or decrement a counter, there is a nested call to the critical*()
>> one and another call to spinlock_enter().  spinlock_enter() is MD and
>> might need to do lots of slow hardware things.  critical_enter() does
>> the following on i386:
>>
>> % void
>> % spinlock_enter(void)
>> % {
>> % 	struct thread *td;
>> % 	register_t flags;
>> %
>> % 	td = curthread;
>> % 	if (td->td_md.md_spinlock_count == 0) {
>> % 		flags = intr_disable();
>>
>> This is a CPU control instruction and thus tends to be slow.  It was very
>> slow on Pentium4.  It might involve some serialization although it is
>> not a full serialization instruction.
>>
>> % 		td->td_md.md_spinlock_count = 1;
>> % 		flags &= ~PSL_T;
>>
>> The previous line is from my version.  It fixes spurious trace traps when

That is, it is from my version of -current, which is just -current with
a few small fixes like this.

>> the flags are popped in critical_exit().  Similar fixes are needed for
>> the pushfl/popfl sequences in swtch.s.  The spurious trace traps were
>> and might still be more harmful than they should be since they exercise
>> deadlock bugs in syscons and/or printf.  Simply trace through a large
>> amount of code in ddb, going through here a few times to set up spurious
>> trace traps for several td's.   It may also be necessary to have syscons
>> and/or printf doing non-ddb i/o.  Eventually the trace traps bite and
>> demonstrate the deadlock.
>>
>> % 		td->td_md.md_saved_flags = flags;
>> % 	} else
>> % 		td->td_md.md_spinlock_count++;
>> % 	critical_enter();
>> % }
>>
>> Everything else uses simple non-control instructions so it is quite fast.
>> However, if this is not serialized, then it can run in parallel with
>> mtx_lock_spin() and vice versa since there are no inter-dependencies.
>> It is unclear whether the parallelism is helped or harmed by not
>> inlining mtx_lock_spin().
>>
>>>> My version goes the other way and uninlines mtx_lock_spin() and
>>>> mtx_unlock_spin().  Then it inlines (open codes) critical_enter() and
>>>> critical_exit() in them.  This saves a lot of text space and thus
>>>> hopefully saves time too.  I couldn't find any cases where it either
>>>> ...
>>>>
>>>> OTOH, I couldn't get uninlining of mtx_lock() and mtx_unlock() to work.
>>>> ..
>>>
>>> I don't think that uninling mtx_lock()/unlock() (btw, on which hw are
>>> you testing them if you are still able to catch performance penalties
>>> by branch misprediction?!) is a good idea, likely what would be a
>>> better one is to both inline critical_enter() and spinlock_enter().
>>
>> Er, it is a good idea, as explained above.  Whether it is better in
>> practice is very MD.  The mtx non-calls are already quite large, and
>> adding critical*() and spinlock*() to them would make them larger.
>> Above a certain MD size, inlining is just slower because it busts caches.
>> spinlock*() is especially hard to inline because it does MD magic that
>> might be even larger than the i386 version.
>
> You are misunderstanding.

No.

> mtx_lock()/unlock() don't call spinlock_enter()/spinlock_exit() thus
> their inlined call results more or less in a single atomic operation.

We're not discussing mtx_lock/unlock(), since they don't call
critical_enter/exit().

We're discussing mtx_lock/unlock_spin(), since they call
critical_enter/exit() (indirectly via spinlock_enter/exit() in -current
and directly in my version).

> That must not be wrapped in a function call IMHO.
> (If your OS does a quite different thing I don't know, I don't have
> sources off-hand, but it is quite difficult to follow you sometimes
> because you mix FreeBSD behaviour with your-OS behaviour).

The above is mostly about -current.  My previous mail discussed why
uninlining mtx_lock/unlock() doesn't work so well.  It is because these
macros are shorter and don't make unconditional function calls, so
inlining of them has better chances of improving their efficiency, and
in fact does.

Bruce