[Bug 238837] init: Remove P_SYSTEM flag from PID 1 to allow easier debugging of init(8)

Sat Jul 13 10:10:53 UTC 2019

On Sat, 13 Jul 2019 a bug that doesn't want replies at freebsd.org wrote:

> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238837
>
> --- Comment #3 from WHR <msl0000023508 at gmail.com> ---
> (In reply to Conrad Meyer from comment #2)
>
> I knwon that ptrace(2) can't be used to debug any kernel processes, allowing
> that would hang or panic the kernel; but process 1 (init(8)) isn't a kernel,
> but an user process, right?
>
>> Have you tried using gdb on it with this patch?
> Yes, I has been used a 10.3-RELEASE-p29 kernel with this patch applied, for a
> long time; I can debug init(8) in gdb(1) any time, without any problem.

I removed P_SYSTEM for init 15-20 years ago in my version.  IIRC, I wanted
this more for ktracing init than for ptrace on it.

This patch also fixes some hard-coding of init's pid.  This is incomplete.
It keeps too many restrictions on sending signals to init.  p_cansignal()
doesn't understand the special restrictions on sending signals to init or
even to P_SYSTEM processes.

XX Index: init_main.c
XX ===================================================================
XX RCS file: /home/ncvs/src/sys/kern/init_main.c,v
XX retrieving revision 1.243
XX diff -u -2 -r1.243 init_main.c
XX --- init_main.c	16 Jun 2004 00:26:29 -0000	1.243
XX +++ init_main.c	22 Jul 2010 15:28:09 -0000
XX @@ -681,5 +671,5 @@
XX 
XX  /*
XX - * Like kthread_create(), but runs in it's own address space.
XX + * Like kthread_create(), but runs in its own address space.
XX   * We do this early to reserve pid 1.
XX   *

Unrelated style fix.  Init's pid also shouldn't be hard-coded in comments...

XX @@ -697,8 +687,8 @@
XX  		panic("cannot fork init: %d\n", error);
XX  	KASSERT(initproc->p_pid == 1, ("create_init: initproc->p_pid != 1"));

... but since there is this KASSERT(), the comment is not wrong.

XX -	/* divorce init's credentials from the kernel's */
XX +
XX +	/* Divorce init's credentials from the kernel's. */
XX  	newcred = crget();
XX  	PROC_LOCK(initproc);
XX -	initproc->p_flag |= P_SYSTEM;

The only part that actully removes the flag.  I'm not sure if this is
complete enough -- see the notes on the pid tests.

XX  	oldcred = initproc->p_ucred;
XX  	crcopy(newcred, oldcred);
XX Index: kern_sig.c
XX ===================================================================
XX RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v
XX retrieving revision 1.281
XX diff -u -2 -r1.281 kern_sig.c
XX --- kern_sig.c	11 Jun 2004 11:16:23 -0000	1.281
XX +++ kern_sig.c	15 Feb 2005 08:40:46 -0000
XX @@ -1312,5 +1307,5 @@
XX  		LIST_FOREACH(p, &allproc, p_list) {
XX  			PROC_LOCK(p);
XX -			if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
XX +			if (p == initproc || p->p_flag & P_SYSTEM ||
XX  			    p == td->td_proc) {
XX  				PROC_UNLOCK(p);

All the pid tests for signals are bogus, but I kept them for init.  This
one prevents broadcasting signals to pids <= 1 and to P_SYSTEM processes.
When P_SYSTEM is set for init, the pid test has no effect for init.
P_SYSTEM is still set for proc0, so the pid test has no effect when the
pid is 0.  There are now many kernel processes with pid > 1, and the
pid check much more obviously has no effect for these.  So P_SYSTEM should
be set for these processes, and I think it is (kthread_create() sets it
and I think it is never cleared).

XX @@ -1343,5 +1338,5 @@
XX  		LIST_FOREACH(p, &pgrp->pg_members, p_pglist) {
XX  			PROC_LOCK(p); 
XX -			if (p->p_pid <= 1 || p->p_flag & P_SYSTEM) {
XX +			if (p == initproc || p->p_flag & P_SYSTEM) {
XX  				PROC_UNLOCK(p);
XX  				continue;
XX @@ -2127,5 +2170,5 @@
XX  			 * Don't take default actions on system processes.
XX  			 */
XX -			if (p->p_pid <= 1) {
XX +			if (p == initproc || p->p_flag & P_SYSTEM) {
XX  #ifdef DIAGNOSTIC
XX  				/*

More bogus pid tests.  The last one is most interesting.  It prevents
taking default actions for P_SYSTEM processes.  I think this prevents
the foot shooting of killing init with -9 and the more useful behaviour
of killing init with a signal that makes it dump core for debugging
later.

This shows the danger of simply removing the flag.  I like allowing foot
shooting, but don't want to change the policy for it here.

P_SYSTEM was and is checked in kern/*.c without a pid test in:
- kern_switch.c:choosethread: to allow system processes to run during
   panic().  Without this change, init was allowed to run then.  This seems
   wrong.
- sys_process.c:kern_ptrace(): to disallow debugging system processes.
   This is wrong for init, and the change fixes it.  But debugging also
   requires normal signal handling, perhaps including core dumps, so
   my replacements for the bogus pid tests may be too strong.

P_SYSTEM is now also checked in kern/*.c without a pid test in:
- kern_proc.c:sysctl_kern_proc_args(): to disallow seeing args for
   system processes.  This is wrong for init.
- kern_proc.c:sysctl_kern_proc_env(): similarly for the environment.
- kern_proc.c:sysctl_kern_proc_auxv(): similarly for the ELF aux vector.
- kern_proc.c:stop_all_proc(): to disallow stopping system processes.
   This seems right for init, and the change breaks it.
- kern_procctl.c:protect_setchild(): seems to be to disallow procctl on
   init.  This seems to need tests like the ones for signals, if any.
- kern_racct.c:racct_proc_throttle(): to disallow throttling of system
   processes.  The comment on this conflates system processes with kernel
   processes, and the code is misformatted using a blank line to displace
   half of the code covered by the comment.

I said that I wanted this more for ktrace than for debugging, but ktrace
doesn't seem to check P_SYSTEM.  It checks p_candebug().  p_candebug()
seems to be correct for ktracing but not for debugging, since debugging
needs ptrace which needs P_SYSTEM, but p_candebug() doesn't know that
it needs P_SYSTEM.  Thus gdb on init is currently permitted, but fails
in ptace().

Bruce