Re: A panic a day
- Reply: Mateusz Guzik : "Re: A panic a day"
- In reply to: Mark Johnston : "Re: A panic a day"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 22 Sep 2022 19:05:46 UTC
On Thu, Sep 22, 2022 at 03:00:53PM -0400, Mark Johnston wrote:
> On Thu, Sep 22, 2022 at 11:31:40AM -0700, Steve Kargl wrote:
> > All,
> >
> > I updated my kernel/world/all ports on Sept 19 2022.
> > Since then, I have had daily panics and hard lock-up
> > (no panic, keyboard, mouse, network, ...). The one
> > panic I did witness sent text scolling off the screen.
> > There is no dump, or at least, I haven't figured out
> > a way to get a dump.
> >
> > Using ports/graphics/tesseract and then hand editing
> > the OCR result, the last visible portions is
> >
> >
(panic messages removed).
> It looks like you use the 4BSD scheduler? I think there's a bug in
> kick_other_cpu() in that it doesn't make sure that the remote CPU's
> curthread lock is held when modifying thread state. Because 4BSD has a
> global scheduler lock, this is often true in practice, but doesn't have
> to be.
Yes, I use 4BSD. ULE has very poor performance for HPC type work with
OpenMPI.
> I think this untested patch will address the panics. The bug was there
> for a long time but some recent restructuring added an assertion which
> caught it.
I'll give it a try, and report back. Thanks!
--
steve
> diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> index 9d48aa746f6d..484864b66c1c 100644
> --- a/sys/kern/sched_4bsd.c
> +++ b/sys/kern/sched_4bsd.c
> @@ -1282,9 +1282,10 @@ kick_other_cpu(int pri, int cpuid)
> }
> #endif /* defined(IPI_PREEMPTION) && defined(PREEMPTION) */
>
> - ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
> - ipi_cpu(cpuid, IPI_AST);
> - return;
> + if (pcpu->pc_curthread->td_lock == &sched_lock) {
> + ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
> + ipi_cpu(cpuid, IPI_AST);
> + }
> }
> #endif /* SMP */
>
> @@ -1397,7 +1398,7 @@ sched_add(struct thread *td, int flags)
>
> cpuid = PCPU_GET(cpuid);
> if (single_cpu && cpu != cpuid) {
> - kick_other_cpu(td->td_priority, cpu);
> + kick_other_cpu(td->td_priority, cpu);
> } else {
> if (!single_cpu) {
> tidlemsk = idle_cpus_mask;
--
Steve