Re: A panic a day
- Reply: Steve Kargl : "Re: A panic a day"
- In reply to: Steve Kargl : "Re: A panic a day"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 22 Sep 2022 19:07:08 UTC
On 9/22/22, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
> On Thu, Sep 22, 2022 at 03:00:53PM -0400, Mark Johnston wrote:
>> On Thu, Sep 22, 2022 at 11:31:40AM -0700, Steve Kargl wrote:
>> > All,
>> >
>> > I updated my kernel/world/all ports on Sept 19 2022.
>> > Since then, I have had daily panics and hard lock-up
>> > (no panic, keyboard, mouse, network, ...). The one
>> > panic I did witness sent text scolling off the screen.
>> > There is no dump, or at least, I haven't figured out
>> > a way to get a dump.
>> >
>> > Using ports/graphics/tesseract and then hand editing
>> > the OCR result, the last visible portions is
>> >
>> >
>
> (panic messages removed).
>
>> It looks like you use the 4BSD scheduler? I think there's a bug in
>> kick_other_cpu() in that it doesn't make sure that the remote CPU's
>> curthread lock is held when modifying thread state. Because 4BSD has a
>> global scheduler lock, this is often true in practice, but doesn't have
>> to be.
>
> Yes, I use 4BSD. ULE has very poor performance for HPC type work with
> OpenMPI.
>
Is there an easy way to set it up for testing purposes?
>> I think this untested patch will address the panics. The bug was there
>> for a long time but some recent restructuring added an assertion which
>> caught it.
>
> I'll give it a try, and report back. Thanks!
>
> --
> steve
>
>> diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
>> index 9d48aa746f6d..484864b66c1c 100644
>> --- a/sys/kern/sched_4bsd.c
>> +++ b/sys/kern/sched_4bsd.c
>> @@ -1282,9 +1282,10 @@ kick_other_cpu(int pri, int cpuid)
>> }
>> #endif /* defined(IPI_PREEMPTION) && defined(PREEMPTION) */
>>
>> - ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
>> - ipi_cpu(cpuid, IPI_AST);
>> - return;
>> + if (pcpu->pc_curthread->td_lock == &sched_lock) {
>> + ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
>> + ipi_cpu(cpuid, IPI_AST);
>> + }
>> }
>> #endif /* SMP */
>>
>> @@ -1397,7 +1398,7 @@ sched_add(struct thread *td, int flags)
>>
>> cpuid = PCPU_GET(cpuid);
>> if (single_cpu && cpu != cpuid) {
>> - kick_other_cpu(td->td_priority, cpu);
>> + kick_other_cpu(td->td_priority, cpu);
>> } else {
>> if (!single_cpu) {
>> tidlemsk = idle_cpus_mask;
>
> --
> Steve
>
>
--
Mateusz Guzik <mjguzik gmail.com>