git: 175db7b58270 - main - mi_switch(9): update to current day

From: Mitchell Horne <mhorne_at_FreeBSD.org>
Date: Thu, 09 Feb 2023 16:02:47 UTC
The branch main has been updated by mhorne:

URL: https://cgit.FreeBSD.org/src/commit/?id=175db7b58270fb5ac98d874b106dc7b9afe7d9f6

commit 175db7b58270fb5ac98d874b106dc7b9afe7d9f6
Author:     Mitchell Horne <mhorne@FreeBSD.org>
AuthorDate: 2023-02-09 15:41:14 +0000
Commit:     Mitchell Horne <mhorne@FreeBSD.org>
CommitDate: 2023-02-09 16:01:32 +0000

    mi_switch(9): update to current day
    
    The function itself and much of the information in this page remains
    relevant, but many details need to be fixed.
     - Update function signatures
     - Update the list of major uses of mi_switch() (it is not exhaustive)
     - Document 'flags' argument and its possible values
     - Document thread lock requirement for callers
     - Thread runtime limits are out of scope now, no need to describe them
     - Remove outdated information w.r.t. KSE, runqueue, non-preemptible
       kernel, etc
     - Update the description of cpu_switch() and its responsibilities
    
    PR:             149574
    Reviewed by:    kib
    Discussed with: markj
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D38185
---
 share/man/man9/mi_switch.9 | 216 ++++++++++++++++++++++++++++++---------------
 1 file changed, 147 insertions(+), 69 deletions(-)

diff --git a/share/man/man9/mi_switch.9 b/share/man/man9/mi_switch.9
index 835356744647..199569845380 100644
--- a/share/man/man9/mi_switch.9
+++ b/share/man/man9/mi_switch.9
@@ -2,10 +2,14 @@
 .\"
 .\" Copyright (c) 1996 The NetBSD Foundation, Inc.
 .\" All rights reserved.
+.\" Copyright (c) 2023 The FreeBSD Foundation
 .\"
 .\" This code is derived from software contributed to The NetBSD Foundation
 .\" by Paul Kranenburg.
 .\"
+.\" Portions of this documentation were written by Mitchell Horne
+.\" under sponsorship from the FreeBSD Foundation.
+.\"
 .\" Redistribution and use in source and binary forms, with or without
 .\" modification, are permitted provided that the following conditions
 .\" are met:
@@ -29,7 +33,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd November 24, 1996
+.Dd January 9, 2023
 .Dt MI_SWITCH 9
 .Os
 .Sh NAME
@@ -41,96 +45,171 @@
 .In sys/param.h
 .In sys/proc.h
 .Ft void
-.Fn mi_switch "void"
+.Fn mi_switch "int flags"
 .Ft void
-.Fn cpu_switch "void"
+.Fn cpu_switch "struct thread *oldtd" "struct thread *newtd" "struct mtx *lock"
 .Ft void
-.Fn cpu_throw "void"
+.Fn cpu_throw "struct thread *oldtd" "struct thread *newtd"
 .Sh DESCRIPTION
 The
 .Fn mi_switch
-function implements the machine independent prelude to a thread context
+function implements the machine-independent prelude to a thread context
 switch.
-It is called from only a few distinguished places in the kernel
-code as a result of the principle of non-preemptable kernel mode execution.
+It is the single entry point for every context switch and is called from only
+a few distinguished places in the kernel.
+The context switch is, by necessity, always performed by the switched thread,
+even when the switch is initiated from elsewhere; e.g. preemption requested via
+Inter-Processor Interrupt (IPI).
+.Pp
 The various major uses of
-.Nm
+.Fn mi_switch
 can be enumerated as follows:
 .Bl -enum -offset indent
 .It
 From within a function such as
-.Xr cv_wait 9 ,
-.Xr mtx_lock 9 ,
+.Xr sleepq_wait 9
 or
-.Xr tsleep 9
+.Fn turnstile_wait
 when the current thread
 voluntarily relinquishes the CPU to wait for some resource or lock to become
 available.
 .It
-After handling a trap
-(e.g.\& a system call, device interrupt)
-when the kernel prepares a return to user-mode execution.
-This case is
-typically handled by machine dependent trap-handling code after detection
-of a change in the signal disposition of the current process, or when a
-higher priority thread might be available to run.
-The latter event is
-communicated by the machine independent scheduling routines by calling
-the machine defined
-.Fn need_resched .
+Involuntary preemption due to arrival of a higher-priority thread.
+.It
+At the tail end of
+.Xr critical_exit 9 ,
+if preemption was deferred due to the critical section.
+.It
+Within the TDA_SCHED AST handler, when rescheduling before the return to
+usermode was requested.
+There are several reasons for this, a notable one coming from
+.Fn sched_clock
+when the running thread has exceeded its time slice.
 .It
 In the signal handling code
 (see
 .Xr issignal 9 )
 if a signal is delivered that causes a process to stop.
 .It
-When a thread dies in
-.Xr thread_exit 9
-and control of the processor can be passed to the next runnable thread.
-.It
 In
-.Xr thread_suspend_check 9
+.Fn thread_suspend_check
 where a thread needs to stop execution due to the suspension state of
 the process as a whole.
+.It
+In
+.Xr kern_yield 9
+when a thread wants to voluntarily relinquish the processor.
 .El
 .Pp
+The
+.Va flags
+argument to
 .Fn mi_switch
-records the amount of time the current thread has been running in the
-process structures and checks this value against the CPU time limits
-allocated to the process
-(see
-.Xr getrlimit 2 ) .
-Exceeding the soft limit results in a
-.Dv SIGXCPU
-signal to be posted to the process, while exceeding the hard limit will
-cause a
-.Dv SIGKILL .
+indicates the context switch type.
+One of the following must be passed:
+.Bl -tag -offset indent -width "SWT_REMOTEWAKEIDLE"
+.It Dv SWT_OWEPREEMPT
+Switch due to delayed preemption after exiting a critical section.
+.It Dv SWT_TURNSTILE
+Switch after propagating scheduling priority to the owner of a resource.
+.It Dv SWT_SLEEPQ
+Begin waiting on a
+.Xr sleepqueue 9 .
+.It Dv SWT_RELINQUISH
+Yield call.
+.It Dv SWT_NEEDRESCHED
+Rescheduling was requested.
+.It Dv SWT_IDLE
+Switch from the idle thread.
+.It Dv SWT_IWAIT
+A kernel thread which handles interrupts has finished work and must wait for
+interrupts to schedule additional work.
+.It Dv SWT_SUSPEND
+Thread suspended.
+.It Dv SWT_REMOTEPREEMPT
+Preemption by a higher-priority thread, initiated by a remote processor.
+.It Dv SWT_REMOTEWAKEIDLE
+Idle thread preempted, initiated by a remote processor.
+.It Dv SWT_BIND
+The running thread has been bound to another processor and must be switched
+out.
+.El
 .Pp
-If the thread is still in the
-.Dv TDS_RUNNING
-state,
+In addition to the switch type, callers must specify the nature of the
+switch by performing a bitwise OR with one of the
+.Dv SW_VOL
+or
+.Dv SW_INVOL
+flags, but not both.
+Respectively, these flags denote whether the context switch is voluntary or
+involuntary on the part of the current thread.
+For an involuntary context switch in which the running thread is
+being preempted, the caller should also pass the
+.Dv SW_PREEMPT
+flag.
+.Pp
+Upon entry to
+.Fn mi_switch ,
+the current thread must be holding its assigned thread lock.
+It may be unlocked as part of the context switch.
+After they have been rescheduled and execution resumes, threads will exit
 .Fn mi_switch
-will put it back onto the run queue, assuming that
-it will want to run again soon.
-If it is in one of the other
-states and KSE threading is enabled, the associated
-.Em KSE
-will be made available to any higher priority threads from the same
-group, to allow them to be scheduled next.
+with their thread lock unlocked.
 .Pp
-After these administrative tasks are done,
 .Fn mi_switch
-hands over control to the machine dependent routine
-.Fn cpu_switch ,
-which will perform the actual thread context switch.
+records the amount of time the current thread has been running before handing
+control over to the scheduler, via
+.Fn sched_switch .
+After selecting a new thread to run, the scheduler will call
+.Fn cpu_switch
+to perform the low-level context switch.
 .Pp
 .Fn cpu_switch
-first saves the context of the current thread.
-Next, it calls
-.Fn choosethread
-to determine which thread to run next.
-Finally, it reads in the saved context of the new thread and starts to
-execute the new thread.
+is the machine-dependent function that performs the actual switch from the
+running thread
+.Fa oldtd
+to the chosen thread
+.Fa newtd .
+First, it saves the context of
+.Fa oldtd
+to its Process Control Block,
+.Po
+PCB
+.Vt struct pcb
+.Pc ,
+pointed at by
+.Va oldtd->td_pcb .
+The function then updates important per-CPU state such as the
+.Dv curthread
+variable, and activates
+.Fa newtd\&'s
+virtual address space using its associated
+.Xr pmap 9
+structure.
+Finally, it reads in the saved context from
+.Fa newtd\&'s
+PCB.
+CPU instruction flow continues in the new thread context, on
+.Fa newtd\&'s
+kernel stack.
+The return from
+.Fn cpu_switch
+can be understood as a completion of the function call initiated by
+.Fa newtd
+when it was previously switched out, at some point in the distant (relative to
+CPU time) past.
+.Pp
+The
+.Fa mtx
+argument to
+.Fn cpu_switch
+is used to pass the mutex which will be stored as
+.Fa oldtd\&'s
+thread lock at the moment that
+.Fa oldtd
+is completely switched out.
+This is an implementation detail of
+.Fn sched_switch .
 .Pp
 .Fn cpu_throw
 is similar to
@@ -140,19 +219,18 @@ This function is useful when the kernel does not have an old thread
 context to save, such as when CPUs other than the boot CPU perform their
 first task switch, or when the kernel does not care about the state of the
 old thread, such as in
-.Fn thread_exit
+.Xr thread_exit 9
 when the kernel terminates the current thread and switches into a new
-thread.
-.Pp
-To protect the
-.Xr runqueue 9 ,
-all of these functions must be called with the
-.Va sched_lock
-mutex held.
+thread,
+.Fa newtd .
+The
+.Fa oldtd
+argument is unused.
 .Sh SEE ALSO
-.Xr cv_wait 9 ,
+.Xr critical_exit 9 ,
 .Xr issignal 9 ,
+.Xr kern_yield 9 ,
 .Xr mutex 9 ,
-.Xr runqueue 9 ,
-.Xr tsleep 9 ,
-.Xr wakeup 9
+.Xr pmap 9 ,
+.Xr sleepqueue 9 ,
+.Xr thread_exit 9