Crashes with 'reboot -d'

Eric Badger badger at FreeBSD.org
Mon Oct 31 22:10:12 UTC 2016


I've run into crashes when using 'reboot -d' (or a slightly tweaked
version of it in our FreeBSD spin at work). The problem is that dump
code is written to run in a panic/crash scenario, when all other CPUs
are stopped. In the case of 'reboot -d', all other CPUs are not stopped.
The code in xpt_polled_action runs what would normally be done by the
interrupt handler, polling start_ccb->ccb_h.status to see when the
operation has been completed. If the real interrupt handler is still
running, however, polling start_ccb->ccb_h.status is not sufficient; the
ccb may be placed in the cam kproc's doneq after start_ccb->ccb_h.status
has been updated. The dumper will reuse the ccb's memory, but when the
cam kproc processes that item in its doneq, it will twiddle bits and
corrupt the now reused ccb memory.

I fixed this by shutting off other CPUs when doing a dump during reboot
(patch below). This seems fine, but perhaps heavy handed. I also
experimented with letting the normal interrupt handler and cam kproc do
the work when we're not in a SCHEDULER_STOPPED() scenario. This seemed
to reduce dump performance and make performance less consistent, but
otherwise worked ok.

I'd appreciate any comments on things I may have failed to consider. If
no objections are raised, I will proceed with the patch here.

Thanks,
Eric

diff --git a/sys/kern/kern_shutdown.c b/sys/kern/kern_shutdown.c
index 79c4c30..bdc0182 100644
--- a/sys/kern/kern_shutdown.c
+++ b/sys/kern/kern_shutdown.c
@@ -319,8 +319,9 @@ void
 kern_reboot(int howto)
 {
        static int once = 0;
+#ifdef SMP
+ cpuset_t other_cpus;

-#if defined(SMP)
        /*
         * Bind us to CPU 0 so that all shutdown code runs there.  Some
         * systems don't shutdown properly (i.e., ACPI power off) if we
@@ -362,8 +363,28 @@ kern_reboot(int howto)
         */
        EVENTHANDLER_INVOKE(shutdown_post_sync, howto);

-   if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping)
+ if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping) {
+#ifdef SMP
+ /*
+  * Dump code assumes that all other CPUs have stopped, and thus
+  * handles disk interrupts manually. This assumption must be enforced,
+  * as otherwise the real interrupt handler may race with the dumper.
+  */
+ if (!SCHEDULER_STOPPED()) {
+         spinlock_enter();
+
+         other_cpus = all_cpus;
+         CPU_CLR(PCPU_GET(cpuid), &other_cpus);
+         stop_cpus_hard(other_cpus);
+
+         curthread->td_stopsched = 1;
+
+         /* Module shutdown is no longer safe. */
+         howto |= RB_NOSYNC;
+ }
+#endif
                doadump(TRUE);
+ }

        /* Now that we're going to really halt the system... */
        EVENTHANDLER_INVOKE(shutdown_final, howto);


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 632 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20161031/befe2153/attachment.sig>


More information about the freebsd-hackers mailing list