suspending threads before devices

Tue Nov 18 22:22:33 UTC 2014

On Saturday, November 15, 2014 1:00:15 pm Konstantin Belousov wrote:
> On Sat, Nov 15, 2014 at 05:05:10PM +0200, Andriy Gapon wrote:
> > On 15/11/2014 12:58, Konstantin Belousov wrote:
> > > On Fri, Nov 14, 2014 at 11:10:45PM +0200, Andriy Gapon wrote:
> > >> On 22/03/2012 16:14, Konstantin Belousov wrote:
> > >>> I already noted this to Jung-uk, I think that current suspend handling
> > >>> is (somewhat) wrong. We shall not stop other CPUs for suspension when
> > >>> they are executing some random kernel code. Rather, CPUs should be safely
> > >>> stopped at the kernel->user boundary, or at sleep point, or at designated
> > >>> suspend point like idle loop.
> > >>>
> > >>> We already are engaged into somewhat doubtful actions like restoring of %cr2,
> > >>> since we might, for instance, preemt page fault handler with suspend IPI.
> > >>
> > >> I recently revisited this issue in the context of some suspend+resume problems
> > >> that I am having with radeonkms driver.  What surprised me is that the driver's
> > >> suspend code has no synchronization whatsoever with its other code paths.  So, I
> > >> looked first at the Linux code and then at the illumos code to see how suspend
> > >> is implemented there.
> > >> As far as I can see, those kernels do exactly what you suggest that we do.
> > >> Before suspending devices they first suspend all threads except for one that
> > >> initiates the suspend.  For userland threads a signal-like mechanism is used to
> > >> put them in a state similar to SIGSTOP-ed one.  With the kernel threads
> > >> mechanisms are different between the kernels.  Also, illumos freezes kernel
> > >> threads after suspending the devices, not before.
> > >>
> > >> I think that we could start with only the userland threads initially.  Do you
> > >> think the SIGSTOP-like approach would be hard to implement for us?
> > > We have most, if not all, parts of the stopping code
> > > already implemented. I mean the single-threading code, see
> > > thread_single(SINGLE_BOUNDARY). The code ensures that other threads in
> > > the current process are stopped either at the kernel->user boundary, or
> > > at the safe kernel sleep point.
> > > 
> > > This is not immediately applicable, since the caller is supposed to be
> > > a thread in the suspended process, but modifications to allow external
> > > process to do the same are really small comparing with the complexity
> > > of the code.  I suspect that all what is needed is change of
> > > 	while/if (remaining != 1)
> > > to
> > > 	while/if ((p == curproc && remaining != 1) ||
> > > 	    (p != curproc && remaining != 0))
> > > together with explicit passing of struct proc *p to thread_single.
> > 
> > Thank you for the pointer!
> > I think that maybe even more changes are required for that code to be usable for
> > suspending.  E.g. maybe a different p_flag bit should be used, because I think
> > that we would like to avoid interaction between the process level suspend and
> > the global suspend.  I.e. the global suspend might encounter a multi-threaded
> > process in a single thread mode and would need to suspend its remaining thread.
> 
> Thread which is a p_singlethread, is not at the safe point; in other
> words, a process which is under the singlethreading, should prevent
> the system from entering sleep state. The singlethreading is the
> temporal state anyway, it is established during exec() or exit(), so
> it is fine to wait for in-process singlethreading to end before outer
> singlethreading is done.
> 
> Anyway, this requires real coding to experiment.  I started looking at
> it since I did somewhat related changes now.

I would certainly like a way to quiesce threads before entering the real suspend
path.  I would also like to cleanly unmount filesystems during suspend as well and
the thread issue is a prerequisite for that.  However, reusing "stop at boundary"
may not be quite correct because you probably don't want to block suspend because
you have an NFS request that is retrying due to a down NFS server.  For NFS I
think you want any threads asleep to just not get a chance to run again until
after resume completes.

-- 
John Baldwin