svn commit: r243515 - head/sys/kern

Attilio Rao attilio at freebsd.org
Thu Dec 13 20:17:35 UTC 2012


On Thu, Dec 13, 2012 at 10:59 AM, Andriy Gapon <avg at freebsd.org> wrote:
> on 09/12/2012 19:27 Attilio Rao said the following:
>> On Sun, Nov 25, 2012 at 2:22 PM, Andriy Gapon <avg at freebsd.org> wrote:
>>> Author: avg
>>> Date: Sun Nov 25 14:22:08 2012
>>> New Revision: 243515
>>> URL: http://svnweb.freebsd.org/changeset/base/243515
>>>
>>> Log:
>>>   remove stop_scheduler_on_panic knob
>>>
>>>   There has not been any complaints about the default behavior, so there
>>>   is no need to keep a knob that enables the worse alternative.
>>>
>>>   Now that the hard-stopping of other CPUs is the only behavior, the panic_cpu
>>>   spinlock-like logic can be dropped, because only a single CPU is
>>>   supposed to win stop_cpus_hard(other_cpus) race and proceed past that
>>>   call.
>>
>> While this is true for the sane case, for the case report by Ryan this
>> still breaks.
>
> Yes.  I haven't got around to start fixing the Ryan's problem yet.
> But this commit should reduce number of places where changes have to be made.
> In fact, I think that only stop_cpus_X would have to be fixed now.
>
>> Infact, immagine CPU0 (winner) and CPU1 (looser) both panic'ing. CPU0
>> wins and then sets stopping_cpu. When the deadlock happens in the
>> spinning loop, because of generic_stop_cpus() logic CPU0 won't
>> deadlock and will correctly continue, but the problem is that it sets
>> back stopping_cpu to NOCPU, letting CPU1 continuing too and then
>> deadlocking.
>>
>> At the minimum, what I think that should happen is to have the check
>> in panic() as prior this change but with the add I outlined (thus we
>> need to generalize cpustop_handler()). However, it seems to me that
>> generic_stop_cpus() may still be broken by this and we eventually need
>> to fix it.
>>
>> I would then revert this part of the patch and fix it appropriately.
>> Later we can better discuss the generic_stop_cpus() similar race.
>
> I actually see this change and the Ryan's problem as orthogonal issues.
> My opinion is let's just fix generic_stop_cpus().

Right, but as I said, for the time being we can at least have a
correct panic() semantic and take the right time to fix the
generic_stop_cpus() and then absorb also the panic() fix into it.
Right now the mechanism is still broken in panic and it can be fixed
with a very easy fix, so we should just do it.
This will also help vendors like Sandvine which may have hit just this bug too.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein


More information about the svn-src-head mailing list