suspending threads before devices

Sat Nov 15 18:00:21 UTC 2014

On Sat, Nov 15, 2014 at 05:05:10PM +0200, Andriy Gapon wrote:
> On 15/11/2014 12:58, Konstantin Belousov wrote:
> > On Fri, Nov 14, 2014 at 11:10:45PM +0200, Andriy Gapon wrote:
> >> On 22/03/2012 16:14, Konstantin Belousov wrote:
> >>> I already noted this to Jung-uk, I think that current suspend handling
> >>> is (somewhat) wrong. We shall not stop other CPUs for suspension when
> >>> they are executing some random kernel code. Rather, CPUs should be safely
> >>> stopped at the kernel->user boundary, or at sleep point, or at designated
> >>> suspend point like idle loop.
> >>>
> >>> We already are engaged into somewhat doubtful actions like restoring of %cr2,
> >>> since we might, for instance, preemt page fault handler with suspend IPI.
> >>
> >> I recently revisited this issue in the context of some suspend+resume problems
> >> that I am having with radeonkms driver.  What surprised me is that the driver's
> >> suspend code has no synchronization whatsoever with its other code paths.  So, I
> >> looked first at the Linux code and then at the illumos code to see how suspend
> >> is implemented there.
> >> As far as I can see, those kernels do exactly what you suggest that we do.
> >> Before suspending devices they first suspend all threads except for one that
> >> initiates the suspend.  For userland threads a signal-like mechanism is used to
> >> put them in a state similar to SIGSTOP-ed one.  With the kernel threads
> >> mechanisms are different between the kernels.  Also, illumos freezes kernel
> >> threads after suspending the devices, not before.
> >>
> >> I think that we could start with only the userland threads initially.  Do you
> >> think the SIGSTOP-like approach would be hard to implement for us?
> > We have most, if not all, parts of the stopping code
> > already implemented. I mean the single-threading code, see
> > thread_single(SINGLE_BOUNDARY). The code ensures that other threads in
> > the current process are stopped either at the kernel->user boundary, or
> > at the safe kernel sleep point.
> > 
> > This is not immediately applicable, since the caller is supposed to be
> > a thread in the suspended process, but modifications to allow external
> > process to do the same are really small comparing with the complexity
> > of the code.  I suspect that all what is needed is change of
> > 	while/if (remaining != 1)
> > to
> > 	while/if ((p == curproc && remaining != 1) ||
> > 	    (p != curproc && remaining != 0))
> > together with explicit passing of struct proc *p to thread_single.
> 
> Thank you for the pointer!
> I think that maybe even more changes are required for that code to be usable for
> suspending.  E.g. maybe a different p_flag bit should be used, because I think
> that we would like to avoid interaction between the process level suspend and
> the global suspend.  I.e. the global suspend might encounter a multi-threaded
> process in a single thread mode and would need to suspend its remaining thread.

Thread which is a p_singlethread, is not at the safe point; in other
words, a process which is under the singlethreading, should prevent
the system from entering sleep state. The singlethreading is the
temporal state anyway, it is established during exec() or exit(), so
it is fine to wait for in-process singlethreading to end before outer
singlethreading is done.

Anyway, this requires real coding to experiment.  I started looking at
it since I did somewhat related changes now.