ptrace problem 6.x/7.x - can someone explain this?

Thu Oct 29 07:03:32 UTC 2009

Dorr, good day.

Tue, Oct 27, 2009 at 05:32:34PM -0700, Dorr H. Clark wrote:
> We believe ptrace has a problem in 6.3; we have not tried other
> releases.  The same code, however, exists in 7.1.

And in HEAD too.

> The bug was first encountered in gdb...
> 
> (gdb) det
> Detaching from program: /usr/local/bin/emacs, process 66217
> (gdb) att 66224
> Attaching to program: /usr/local/bin/emacs, process 66224
> Error accessing memory address 0x281ba5a4: Device busy.
> (gdb) det
> Detaching from program: /usr/local/bin/emacs, process 66224
> ptrace: Device busy.
> (gdb) quit	<--- target process 66224 dies here
> 
> To isolate this problem, a wrote a simple minded test program was
> written that just attached and detached. This test program found 
> even the very first detach fails with EBUSY (see test source below):
> 
> $ ./test1 -p 66217 -c 1 -d 10
> pid 66217 count 1 delay 10
> Start of pass 0
> Calling PT_ATTACH pid 66217 addr 0x0 sig 0
> Calling PT_DETACH pid 66217 addr 0xffffffff sig 0
> Call 0 to PT_DETACH returned -1, errno 16
> 
> Once again, the target process died when the ptracing test program
> exitted, as would be expected if a detach had failed.
> 
> The failure return was coming from the following test in kern_ptrace()
> in sys_process.c
> 
>                 /* not currently stopped */ 
>                 if ((p->p_flag & (P_STOPPED_SIG | P_STOPPED_TRACE)) == 0 || 
>                     p->p_suspcount != p->p_numthreads  || 
>                     (p->p_flag & P_WAITED) == 0) { 
>                         error = EBUSY; 
>                         goto fail; 
>                 }

Yes, the ptraced process should have been waited for, even after
the PT_ATTACH call.  This is somewhat documented in ptrace(2),
-----
-----
but I agree that the wording is a bit sloppy.  I'll try to produce
slightly modified explanation in the manual page and will post the patch
here and as the PR.

I had modified your example to visually display the results of
each wait() call that is made after ptrace() invocation.  Here we go:
-----
$ ./test -p 45901
pid 45901 count 2 delay 5
Start of pass 0
Calling PT_ATTACH pid 45901 addr 0x0 sig 0
Attached
wait() yield 0x117f: stopped by signal 17; <-- after PT_ATTACH
wait() yield 0x57f: stopped by signal 5; <-- after PT_STEP
Calling PT_DETACH pid 45901 addr 0xffffffffffffffff sig 0
Detached.
-----

As you see, the process is stopped just after the PT_ATTACH with the
signal 17, SIGSTOP.  PT_STEP follows with the delivery of the SIGTRAP.
Both of these signals should be processed by the parent's wait().

And PT_DETACH works, apart from one thing: on my 8.0 PT_DETACH leads to
the segfault of the traced program.  I hadn't yet tried it on the other
versions, so may be there is some bug in the code of test.c, or some bug
in the ptrace() implementation -- can't say for sure.  If anyone knows
why the program segfaults -- please, speak up.  The modified source of
the test.s is attached.

> This is applied to all operations except PT_TRACE_ME, PT_ATTACH, and
> some instances of PT_CLEAR_STEP.
> 
> P_WAITED is generally not true. In particular, it's not set
> automatically when a process is PT_ATTACHed.   It is cleared by
> PT_DETACH and again when ptrace sends a signal (PT_CONTINUE,
> PT_DETACH.)  _But_ it's set in only two places, and they aren't in
> ptrace code.
> 
> 2 sys/kern/kern_exit.c      kern_wait         773 p->p_flag |= P_WAITED;
> 3 compat/svr4/svr4_misc.c   svr4_sys_waitsys 1351 q->p_flag |= P_WAITED;
> 
> The relevant one is the first one, primarily. Here's the code:
> 
>                 mtx_lock_spin(&sched_lock); 
>                 if ((p->p_flag & P_STOPPED_SIG) && 
>                     (p->p_suspcount == p->p_numthreads) && 
>                     (p->p_flag & P_WAITED) == 0 && 
>                     (p->p_flag & P_TRACED || options & WUNTRACED)) { 
>                         mtx_unlock_spin(&sched_lock); 
>                         p->p_flag |= P_WAITED; 
>                         sx_xunlock(&proctree_lock); 
>                         td->td_retval[0] = p->p_pid; 
>                         if (status) 
>                                 *status = W_STOPCODE(p->p_xstat); 
>                         PROC_UNLOCK(p); 
>                         return (0); 
>                 } 
>                 mtx_unlock_spin(&sched_lock); 
> 
> So it's only set on processes which are already traced. But it's not
> set until someone calls wait4() on them - or the equivalent sysV
> compatability routine.
> 
> Gdb doesn't always wait4() for processes immediately opon tracing
> them, and the ptrace man page does not imply this is needed. 

Hmm, there is at least one thread on the simular matter,
  http://sourceware.org/ml/gdb/2008-12/msg00041.html
and people are saying that wait() still should be present.

> Moreover, it's not clear why it should matter. The process
> needs to be stopped in order for it to make sense to do most
> of the things ptrace does. But - why should it need to be waited for?

To see if it was really stopped, I presume.

> And what kind of sense does this make to someone writing a debugging
> tool, where the natural logic seems to be:
> - attach to process

- wait for the process' attachment by doing wait().

> - look at some stuff
> - stick in some kind of breakpoint or similar and start it going again
>   (or 'step' it)
> - wait for it to stop
> - look at and modify stuff
> - detach, or set it moving again
> 
> By way of experiment, the test for P_WAITED was removed. Gdb no longer had
> problems, and no new issues with gdb were encountered (although this
> was just interactive, no "gdb coverage test" was attempted).

By the way, I can't reproduce gdb faults with the 8.0 sources.  Will
try 7.x, but I think that I have no 6.x handy.
-- 
Eygene
 _                ___       _.--.   #
 \`.|\..----...-'`   `-._.-'_.-'`   #  Remember that it is hard
 /  ' `         ,       __.--'      #  to read the on-line manual
 )/' _/     \   `-_,   /            #  while single-stepping the kernel.
 `-'" `"\_  ,_.-;_.-\_ ',  fsc/as   #
     _.-'_./   {_.'   ; /           #    -- FreeBSD Developers handbook
    {_.-``-'         {_/            #