kostikbel at gmail.com
Thu Mar 22 14:01:59 UTC 2012
On Wed, Mar 21, 2012 at 04:35:13PM -0700, Sushanth Rai wrote:
> Sometimes I have trouble capturing the "correct" state of a
> multithreaded process using gcore. That is, it looks like target
> process might have done some work since the time command was issued
> and the core file was generated.
> Looking at the code, gcore calls ptrace(PT_ATTACH...), which
> internally issues SIGSTOP, and calls waitpid() to wait until the
> process stops. So, it's quite possible that some threads that are not
> sleeping interruptibly will continue to run until the process notices
> the signal. Signals are only checked when a thread that is tagged to
> handle the signal crosses the user boundary (return from syscall,
> trap). When the thread finally handles SIGSTOP, it needs to stop all
> threads, which is done by lighting a flag-bit it each thread. This
> bit is checked as each thread crosses the user boundary. So, there
> will always be some state change in the target process from the time
> SIGSTOP is posted to the time all threads are actually stopped.
Yes, this is how things work. There are two factors causing the asynchronous
first, other CPUs may execute several threads of the process, so the
suspension of that other threads require an IPI to be generated. IPI_AST
handler just returns, which causes kernel->usermode transition and
possible signal delivery and suspend check.
second, kernel never allows to suspend thread executing and blocked in
kernel. Doing otherwise would cause deadlocks, because executing threads
own resources that are shared with other threads.
So, the only safe points to suspend the threads is at kernel->user boundary
or at some sleep points that are not marked as unsafe with PBDRY flag.
On the other hand, since kernel waits for all threads to suspend before
reporting the wait(2) event, the usermode state shall be consistent with
itself, or rather, it shall be not worse then if the threads reach the
stop point executing asynchronously on different CPUs.
See the check for p->p_suspcount == p->p_numthreads in the kern_wait()
function before it decides that the found process is satisfactory
for wait request.
> I was wondering if I could improve this a bit by calling PT_SUSPEND on
> all threads, instead of posting SIGSTOP and waiting for all threads
> to stop. Once the core is generated, unsuspend all threads. As with
> SIGSTOP, individual thread will only notice suspension as they cross
> user boundary. But there is no overhead of tagging a thread to handle
> the signal and that thread doing the suspension. The idea is to try
> and generate the core file which reflects the running state of the
> process as closely as possible.
PT_SUSPEND can only be called on the process which you alread attached to.
So the call to suspend all threads of the just attached threads is mostly
nop for your purposes.
> Does this sound reasonable ?
I think you need to describe in more details what do you mean by
inconsistent state of the threads in gcore-generated core file, before
some conclusion could be made.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20120322/737188cb/attachment.pgp
More information about the freebsd-hackers