cvs commit: src/sys/i386/i386 vm_machdep.c

Bruce Evans bde at zeta.org.au
Thu Dec 16 23:06:38 PST 2004


On Thu, 16 Dec 2004, John Baldwin wrote:

> On Wednesday 15 December 2004 10:51 pm, Bruce Evans wrote:
> > On Wed, 15 Dec 2004, Kris Kennaway wrote:
> > ...
> > > I often
> > > get overlapping panics from the other CPUs on this machine, and it
> > > often locks up when trying to enter DDB, or while printing the panic
> > > string (the other day it only got as far as 'p' before hanging).
> >
> > panic() needs much the same locking as ddb to prevent concurrent entry.
> > It must be fairly likely for all CPUs to panic on the same asertion.
> > This is like all CPUs entering ddb on the same breakpoint.
>
> The thing is, panic does have locking, but it appears to be ineffective:
>
> #ifdef SMP
>         /*
>          * We don't want multiple CPU's to panic at the same time, so we
>          * use panic_cpu as a simple spinlock.  We have to keep checking
>          * panic_cpu if we are spinning in case the panic on the first
>          * CPU is canceled.
>          */
>         if (panic_cpu != PCPU_GET(cpuid))
>                 while (atomic_cmpset_int(&panic_cpu, NOCPU,
>                     PCPU_GET(cpuid)) == 0)
>                         while (panic_cpu != NOCPU)
>                                 ; /* nothing */
> #endif

Oops, I forgot bout that (and never really noticed that it was the very
first thing in panic() as it needs to be).

> In the smpng branch in p4, I have the lock changed to be based on the thread
> rather than the CPU to account for problems coming from migration due to
> preemption while in a panic, but I haven't observed any noticeable
> improvement from the change:

I can't see any serious problems with the above.  Perhaps it should try
to stop the other CPUs like ddb, since panic() is going to do drastic
things.  I think it needs to use atomic_load_acq_int() in the inner loop,
but that only affects the RESTARTABLE_PANICS case.

> --- //depot/vendor/freebsd/src/sys/kern/kern_shutdown.c	2004/11/05 19:00:32
> +++ //depot/projects/smpng/sys/kern/kern_shutdown.c	2004/11/05 19:22:55
> @@ -473,7 +473,7 @@
>  }
>
>  #ifdef SMP
> -static u_int panic_cpu = NOCPU;
> +static struct thread *panic_thread = NULL;
>  #endif
>
>  /*
> @@ -494,15 +494,14 @@
>  #ifdef SMP
>  	/*
>  	 * We don't want multiple CPU's to panic at the same time, so we
> -	 * use panic_cpu as a simple spinlock.  We have to keep checking
> -	 * panic_cpu if we are spinning in case the panic on the first
> +	 * use panic_thread as a simple spinlock.  We have to keep checking
> +	 * panic_thread if we are spinning in case the panic on the first
>  	 * CPU is canceled.
>  	 */
> -	if (panic_cpu != PCPU_GET(cpuid))
> -		while (atomic_cmpset_int(&panic_cpu, NOCPU,
> -		    PCPU_GET(cpuid)) == 0)
> -			while (panic_cpu != NOCPU)
> -				; /* nothing */
> +	if (panic_thread != curthread)
> +		while (atomic_cmpset_ptr(&panic_thread, NULL, curthread) == 0)
> +			while (panic_thread != NULL)
> +				cpu_spinwait();
>  #endif
>
>  	bootopt = RB_AUTOBOOT | RB_DUMP;
> @@ -538,7 +537,7 @@
>  	/* See if the user aborted the panic, in which case we continue. */
>  	if (panicstr == NULL) {
>  #ifdef SMP
> -		atomic_store_rel_int(&panic_cpu, NOCPU);
> +		atomic_store_rel_ptr(&panic_thread, NULL);
>  #endif
>  		return;
>  	}

Hmm, this has a long history.  It was committed more than 2 years ago
in rev.1.132, then immediately backed out since it wasn't ready.  I
slightly prefer to use the cpuid, since panic() can occur in non-thread
context and during context switching.  PCPU_GET(cpuid) is also better
than curthread->td_oncpu, since changing the latter is not atomic with
changing curthread.

Bruce


More information about the cvs-src mailing list