Re: Handling panics inside vt(4) callbacks

From: Kyle Evans <kevans_at_freebsd.org>
Date: Wed, 12 Apr 2023 21:33:22 UTC
On Wed, Apr 12, 2023 at 3:45 PM Jean-Sébastien Pédron
<dumbbell@freebsd.org> wrote:
>
> Hi!
>
> While working on the DRM drivers, I don't always get a kernel core dump
> in case of a panic.
>
> My hypothesis is that if the DRM driver code called by vt(4) panics,
> then the panic code might not go through successfully. The reason is
> because panic(9) prints the reason, a stacktrace and possibly some
> progress to the console, which calls vt(4) and the DRM driver code again.
>
> I played with the following patch:
> https://gist.github.com/dumbbell/88d77789bfeb38869268c84c40575f49
>
> The idea is that before calling "vt_flush()" in "vtterm_done()", I set a
> global flag to true to indicate that vt(4) is called as part of kdb or a
> panic. If another panic occurs inside vt_flush(), typically the
> underlying DRM driver code, "vtterm_done()" is called recursively and
> "vt_flush()" might trigger the same panic again. If the flag is set, the
> entire function is skipped instead.
>
> I test the patch by adding a panic(9) just before "vt_flush()" and I
> trigger the initial panic with debug.kdb.panic=1. I don't even load a
> DRM driver. My problem is that in this case, the laptop reboots
> immediately. However, if I replace panic(9) with a simple printf(9), it
> works as expected and I get a kernel dump.
>
> I could not find something in panic(9) code that would reboot the
> computer in case of a nested panic.
>
> Previous versions of the patch called doadump() and rebooted the
> computer explicitly if the flag was set, but it didn't work either and I
> thought I could simplify that patch and let panic(9) handle recursion.
> In other words, I just want to skip most of vt(4) code if vt(4) or DRM
> crash.
>
> Does someone spot something wrong in my hypothesis or methodology?
>

FWIW, I have a related patch that I maintain in my tree that I simply
haven't found time to try and upstream.  When the system panics, it
tries to switch back to ttyv0, but it calls into vd_postswitch() with
the vt lock still held. In my case, with i915kms, the vd_postswitch
implementation attempts to sleep with the lock still held and
everything goes off the rails. See below.

Thanks,

Kyle Evans

diff --git a/sys/dev/vt/vt_core.c b/sys/dev/vt/vt_core.c
index 267dd7bf2489..b57370beae7b 100644
--- a/sys/dev/vt/vt_core.c
+++ b/sys/dev/vt/vt_core.c
@@ -592,10 +592,10 @@ vt_window_switch(struct vt_window *vw)
                 * switch to console mode when panicking, making sure the panic
                 * is readable (even when a GUI was using ttyv0).
                 */
+               VT_UNLOCK(vd);
                if ((kdb_active || KERNEL_PANICKED()) &&
                    vd->vd_driver->vd_postswitch)
                        vd->vd_driver->vd_postswitch(vd);
-               VT_UNLOCK(vd);
                return (0);
        }
        if (!(vw->vw_flags & (VWF_OPENED|VWF_CONSOLE))) {