svn commit: r326314 - in head/sys: ddb kern sys

Tue Nov 28 17:04:57 UTC 2017

On Tue, 28 Nov 2017, Edward Tomasz Napierala wrote:

> Log:
>  Make kdb_reenter() silent when explicitly called from db_error().
>  This removes the useless backtrace on various ddb(4) user errors.
>
>  Reviewed by:	jhb@
>  Obtained from:	CheriBSD
>  MFC after:	2 weeks
>  Sponsored by:	DARPA, AFRL
>  Differential Revision:	https://reviews.freebsd.org/D13212

This doesn't fix the spam for user errors that cause a trap, which are
very common (for mistyping memory addresses).  ddb sets up trap handlers
using setjmp, and actually does this correctly in many cases, but when
a trap occurs the error handling is:

trap -> kdb_reenter (print spam here) -> ddb (print error message here)

The second error message is useful iff the trap handler was set up
correctly (longjmp then goes back to a place that knows the precise
context so it is easy to print a relevant error message).  I forget if
this error message is printed using db_error() which would repeat the
spam, but it probably is in correct cases -- ddb should return to the
main loop using longjmp, but it never calls longjmp directly.  Instead
it goes by db_error() to kdb which does the longjmp.  So the correct
error handling (still badly implemented) is now:

     setjmp in main loop
 	setjmp in command, perhaps for memory access that might trap
 	    trap
 	    call kdb
 		print spam (remove this for correctness)
 		longjmp back to command
 	back in command after longmp
 	call db_error
 	    print good error message
 	    call kdb_reenter
 		print spam (remove this for correctness)
 		longjmp back to main loop
     back in main loop after longjmp

IIRC. the jmp_buf for kdn_reenter() is global, so this must be switched to
set up context-sensitive error handling.

> Modified: head/sys/kern/subr_kdb.c
> ==============================================================================
> --- head/sys/kern/subr_kdb.c	Tue Nov 28 11:06:17 2017	(r326313)
> +++ head/sys/kern/subr_kdb.c	Tue Nov 28 12:53:55 2017	(r326314)
> @@ -509,6 +509,17 @@ kdb_reenter(void)
> 	/* NOTREACHED */
> }
>
> +void
> +kdb_reenter_silent(void)
> +{
> +
> +	if (!kdb_active || kdb_jmpbufp == NULL)
> +		return;
> +
> +	longjmp(kdb_jmpbufp, 1);
> +	/* NOTREACHED */

This duplicates style bugs (extra blank line and lint support that even lint
doesn't need) from kdb_reenter().

I use the following fix of disabling the spam in all cases (unless reenabled
in all cases using ddb):

X Index: subr_kdb.c
X ===================================================================
X --- subr_kdb.c	(revision 325403)
X +++ subr_kdb.c	(working copy)
X @@ -1,3 +1,7 @@
X +volatile int bde_ipi = 0;
X +
X +static volatile int bde_kdb_wantspam = 0;
X +
X  /*-
X   * Copyright (c) 2004 The FreeBSD Project
X   * All rights reserved.
X @@ -501,7 +508,9 @@
X  	if (!kdb_active || kdb_jmpbufp == NULL)
X  		return;
X 
X +if (bde_kdb_wantspam & 1)
X  	printf("KDB: reentering\n");
X +if (bde_kdb_wantspam & 2)
X  	kdb_backtrace();
X  	longjmp(kdb_jmpbufp, 1);
X  	/* NOTREACHED */

kib is the author of the verbose messages, and he didn't like just removing
them.  He agreed to putting them under a sysctl (or perhaps just a variable).

Later, I found a case where the verbose messages are actually useful.  I
forget what this was.  Probably some bug that I put in ddb.  ddb is broken
if it doesn't set up a special trap handler using setjmp() before doing
anything that might trap.  So we can detect most unexpected traps by
checking the jmp_buf:
- remove above complications and just longjmp() to the global jmp_buf in
   db_error().  This global jmp_buf is owned by ddb.  Only the pointer to
   it is owned by kdb.  So it is not a layering violationg to use it directly.
- in kdb_reenter(), somehow determine if the jmp_buf is the top-level one.
   Traps are not expected in this case, so print the spam.  Otherwise,
   assume that ddb set up jmp_buf for handling the trap and just longjmp().
   to it.

The spam also affects gdb in a worse way, and the above doesn't fix
it.  gdb also never calls longjmp() directly.  It only calls
kdb_reenter() directly after receiving ^C when switching back to ddb
fails.  Then it prints "using longjmp, hope it works!", but actually
uses kdb_reenter() which prints spam and actually uses longjmp().  The
spam goes to the console which is a logically different device to the
gdb device, so might be invisible if the device is physical difference
or mess up the gdb device if the devices are physically the same.  The
error messages have the same problem.  It is the case where switching
back to ddb doesn't work that we prefer to see the messages about this
failure on the gdb device.

Bruce