Re: A panic a day

From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Thu, 22 Sep 2022 19:07:08 UTC
On 9/22/22, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
> On Thu, Sep 22, 2022 at 03:00:53PM -0400, Mark Johnston wrote:
>> On Thu, Sep 22, 2022 at 11:31:40AM -0700, Steve Kargl wrote:
>> > All,
>> >
>> > I updated my kernel/world/all ports on Sept 19 2022.
>> > Since then, I have had daily panics and hard lock-up
>> > (no panic, keyboard, mouse, network, ...).  The one
>> > panic I did witness sent text scolling off the screen.
>> > There is no dump, or at least, I haven't figured out
>> > a way to get a dump.
>> >
>> > Using ports/graphics/tesseract and then hand editing
>> > the OCR result, the last visible portions is
>> >
>> >
>
> (panic messages removed).
>
>> It looks like you use the 4BSD scheduler?  I think there's a bug in
>> kick_other_cpu() in that it doesn't make sure that the remote CPU's
>> curthread lock is held when modifying thread state.  Because 4BSD has a
>> global scheduler lock, this is often true in practice, but doesn't have
>> to be.
>
> Yes, I use 4BSD.  ULE has very poor performance for HPC type work with
> OpenMPI.
>

Is there an easy way to set it up for testing purposes?

>> I think this untested patch will address the panics.  The bug was there
>> for a long time but some recent restructuring added an assertion which
>> caught it.
>
> I'll give it a try, and report back.  Thanks!
>
> --
> steve
>
>> diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
>> index 9d48aa746f6d..484864b66c1c 100644
>> --- a/sys/kern/sched_4bsd.c
>> +++ b/sys/kern/sched_4bsd.c
>> @@ -1282,9 +1282,10 @@ kick_other_cpu(int pri, int cpuid)
>>  	}
>>  #endif /* defined(IPI_PREEMPTION) && defined(PREEMPTION) */
>>
>> -	ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
>> -	ipi_cpu(cpuid, IPI_AST);
>> -	return;
>> +	if (pcpu->pc_curthread->td_lock == &sched_lock) {
>> +		ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
>> +		ipi_cpu(cpuid, IPI_AST);
>> +	}
>>  }
>>  #endif /* SMP */
>>
>> @@ -1397,7 +1398,7 @@ sched_add(struct thread *td, int flags)
>>
>>  	cpuid = PCPU_GET(cpuid);
>>  	if (single_cpu && cpu != cpuid) {
>> -	        kick_other_cpu(td->td_priority, cpu);
>> +		kick_other_cpu(td->td_priority, cpu);
>>  	} else {
>>  		if (!single_cpu) {
>>  			tidlemsk = idle_cpus_mask;
>
> --
> Steve
>
>


-- 
Mateusz Guzik <mjguzik gmail.com>