ptrace problem 6.x/7.x - can someone explain this?
Eygene Ryabinkin
rea-fbsd at codelabs.ru
Thu Oct 29 07:03:32 UTC 2009
Dorr, good day.
Tue, Oct 27, 2009 at 05:32:34PM -0700, Dorr H. Clark wrote:
> We believe ptrace has a problem in 6.3; we have not tried other
> releases. The same code, however, exists in 7.1.
And in HEAD too.
> The bug was first encountered in gdb...
>
> (gdb) det
> Detaching from program: /usr/local/bin/emacs, process 66217
> (gdb) att 66224
> Attaching to program: /usr/local/bin/emacs, process 66224
> Error accessing memory address 0x281ba5a4: Device busy.
> (gdb) det
> Detaching from program: /usr/local/bin/emacs, process 66224
> ptrace: Device busy.
> (gdb) quit <--- target process 66224 dies here
>
> To isolate this problem, a wrote a simple minded test program was
> written that just attached and detached. This test program found
> even the very first detach fails with EBUSY (see test source below):
>
> $ ./test1 -p 66217 -c 1 -d 10
> pid 66217 count 1 delay 10
> Start of pass 0
> Calling PT_ATTACH pid 66217 addr 0x0 sig 0
> Calling PT_DETACH pid 66217 addr 0xffffffff sig 0
> Call 0 to PT_DETACH returned -1, errno 16
>
> Once again, the target process died when the ptracing test program
> exitted, as would be expected if a detach had failed.
>
> The failure return was coming from the following test in kern_ptrace()
> in sys_process.c
>
> /* not currently stopped */
> if ((p->p_flag & (P_STOPPED_SIG | P_STOPPED_TRACE)) == 0 ||
> p->p_suspcount != p->p_numthreads ||
> (p->p_flag & P_WAITED) == 0) {
> error = EBUSY;
> goto fail;
> }
Yes, the ptraced process should have been waited for, even after
the PT_ATTACH call. This is somewhat documented in ptrace(2),
-----
-----
but I agree that the wording is a bit sloppy. I'll try to produce
slightly modified explanation in the manual page and will post the patch
here and as the PR.
I had modified your example to visually display the results of
each wait() call that is made after ptrace() invocation. Here we go:
-----
$ ./test -p 45901
pid 45901 count 2 delay 5
Start of pass 0
Calling PT_ATTACH pid 45901 addr 0x0 sig 0
Attached
wait() yield 0x117f: stopped by signal 17; <-- after PT_ATTACH
wait() yield 0x57f: stopped by signal 5; <-- after PT_STEP
Calling PT_DETACH pid 45901 addr 0xffffffffffffffff sig 0
Detached.
-----
As you see, the process is stopped just after the PT_ATTACH with the
signal 17, SIGSTOP. PT_STEP follows with the delivery of the SIGTRAP.
Both of these signals should be processed by the parent's wait().
And PT_DETACH works, apart from one thing: on my 8.0 PT_DETACH leads to
the segfault of the traced program. I hadn't yet tried it on the other
versions, so may be there is some bug in the code of test.c, or some bug
in the ptrace() implementation -- can't say for sure. If anyone knows
why the program segfaults -- please, speak up. The modified source of
the test.s is attached.
> This is applied to all operations except PT_TRACE_ME, PT_ATTACH, and
> some instances of PT_CLEAR_STEP.
>
> P_WAITED is generally not true. In particular, it's not set
> automatically when a process is PT_ATTACHed. It is cleared by
> PT_DETACH and again when ptrace sends a signal (PT_CONTINUE,
> PT_DETACH.) _But_ it's set in only two places, and they aren't in
> ptrace code.
>
> 2 sys/kern/kern_exit.c kern_wait 773 p->p_flag |= P_WAITED;
> 3 compat/svr4/svr4_misc.c svr4_sys_waitsys 1351 q->p_flag |= P_WAITED;
>
> The relevant one is the first one, primarily. Here's the code:
>
> mtx_lock_spin(&sched_lock);
> if ((p->p_flag & P_STOPPED_SIG) &&
> (p->p_suspcount == p->p_numthreads) &&
> (p->p_flag & P_WAITED) == 0 &&
> (p->p_flag & P_TRACED || options & WUNTRACED)) {
> mtx_unlock_spin(&sched_lock);
> p->p_flag |= P_WAITED;
> sx_xunlock(&proctree_lock);
> td->td_retval[0] = p->p_pid;
> if (status)
> *status = W_STOPCODE(p->p_xstat);
> PROC_UNLOCK(p);
> return (0);
> }
> mtx_unlock_spin(&sched_lock);
>
> So it's only set on processes which are already traced. But it's not
> set until someone calls wait4() on them - or the equivalent sysV
> compatability routine.
>
> Gdb doesn't always wait4() for processes immediately opon tracing
> them, and the ptrace man page does not imply this is needed.
Hmm, there is at least one thread on the simular matter,
http://sourceware.org/ml/gdb/2008-12/msg00041.html
and people are saying that wait() still should be present.
> Moreover, it's not clear why it should matter. The process
> needs to be stopped in order for it to make sense to do most
> of the things ptrace does. But - why should it need to be waited for?
To see if it was really stopped, I presume.
> And what kind of sense does this make to someone writing a debugging
> tool, where the natural logic seems to be:
> - attach to process
- wait for the process' attachment by doing wait().
> - look at some stuff
> - stick in some kind of breakpoint or similar and start it going again
> (or 'step' it)
> - wait for it to stop
> - look at and modify stuff
> - detach, or set it moving again
>
> By way of experiment, the test for P_WAITED was removed. Gdb no longer had
> problems, and no new issues with gdb were encountered (although this
> was just interactive, no "gdb coverage test" was attempted).
By the way, I can't reproduce gdb faults with the 8.0 sources. Will
try 7.x, but I think that I have no 6.x handy.
--
Eygene
_ ___ _.--. #
\`.|\..----...-'` `-._.-'_.-'` # Remember that it is hard
/ ' ` , __.--' # to read the on-line manual
)/' _/ \ `-_, / # while single-stepping the kernel.
`-'" `"\_ ,_.-;_.-\_ ', fsc/as #
_.-'_./ {_.' ; / # -- FreeBSD Developers handbook
{_.-``-' {_/ #
More information about the freebsd-stable
mailing list