Re: A panic a day

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu>
Date: Thu, 22 Sep 2022 19:05:46 UTC
On Thu, Sep 22, 2022 at 03:00:53PM -0400, Mark Johnston wrote:
> On Thu, Sep 22, 2022 at 11:31:40AM -0700, Steve Kargl wrote:
> > All,
> > 
> > I updated my kernel/world/all ports on Sept 19 2022.
> > Since then, I have had daily panics and hard lock-up
> > (no panic, keyboard, mouse, network, ...).  The one
> > panic I did witness sent text scolling off the screen.
> > There is no dump, or at least, I haven't figured out
> > a way to get a dump.
> > 
> > Using ports/graphics/tesseract and then hand editing 
> > the OCR result, the last visible portions is
> > 
> > 

(panic messages removed).

> It looks like you use the 4BSD scheduler?  I think there's a bug in
> kick_other_cpu() in that it doesn't make sure that the remote CPU's
> curthread lock is held when modifying thread state.  Because 4BSD has a
> global scheduler lock, this is often true in practice, but doesn't have
> to be.

Yes, I use 4BSD.  ULE has very poor performance for HPC type work with
OpenMPI.  

> I think this untested patch will address the panics.  The bug was there
> for a long time but some recent restructuring added an assertion which
> caught it.

I'll give it a try, and report back.  Thanks!

-- 
steve

> diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> index 9d48aa746f6d..484864b66c1c 100644
> --- a/sys/kern/sched_4bsd.c
> +++ b/sys/kern/sched_4bsd.c
> @@ -1282,9 +1282,10 @@ kick_other_cpu(int pri, int cpuid)
>  	}
>  #endif /* defined(IPI_PREEMPTION) && defined(PREEMPTION) */
>  
> -	ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
> -	ipi_cpu(cpuid, IPI_AST);
> -	return;
> +	if (pcpu->pc_curthread->td_lock == &sched_lock) {
> +		ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
> +		ipi_cpu(cpuid, IPI_AST);
> +	}
>  }
>  #endif /* SMP */
>  
> @@ -1397,7 +1398,7 @@ sched_add(struct thread *td, int flags)
>  
>  	cpuid = PCPU_GET(cpuid);
>  	if (single_cpu && cpu != cpuid) {
> -	        kick_other_cpu(td->td_priority, cpu);
> +		kick_other_cpu(td->td_priority, cpu);
>  	} else {
>  		if (!single_cpu) {
>  			tidlemsk = idle_cpus_mask;

-- 
Steve