[Bug 287050] Change in PTRACE_CONTINUE causing valgrind/vgdb to no longer be able to interrupt debuggee

From: <bugzilla-noreply_at_freebsd.org>
Date: Sat, 24 May 2025 16:31:13 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=287050

            Bug ID: 287050
           Summary: Change in PTRACE_CONTINUE causing valgrind/vgdb to no
                    longer be able to interrupt debuggee
           Product: Base System
           Version: 15.0-CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: pjfloyd@wanadoo.fr

This is with

FreeBSD freebsd 15.0-CURRENT FreeBSD 15.0-CURRENT #0 main-n277145-6ee513f4f26d:
Thu May  8 05:11:14 UTC 2025    
root@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

Background
~~~~~~~~~~

When running Valgrind with vgdb gdb uses its remote protocol with vgdb in the
following manner

gdb <-pipe-> vgdb <-fifo-> valgrind

If the user wants to interrupt the inferior with ctrl-C then the following
should happen

1. gdb handles SIGINT and passes the relevant packet to vgdb
2. vgdb uses the same technique as gdb to invoke a function in Valgrind
2a. PTRACE_ATTACH to the Valgrind pid
2b. PTRACE_ GETREGS READ WRITE and SETREGS to setup a stack and RIP for the
function to invoke
2c. PTRACE_CONTINUE to run the invoked function
3. The invoked Valgrind function does loads of hacky Valgrind things then
passes control back to gdb

Problem
~~~~~~~
I'm seeing the above process get as far as 2c on the vgdb side but I'm not
seeing step 3 the invoked function running on the Valgrind side.

Could this be related to https://reviews.freebsd.org/D49678 ?

Small reproducer
~~~~~~~~~~~~~~~~

1st terminal
------------
Run
valgrind --tool=none --vgdb-error=0 -d sleep 100000
(the -d is optional for debug output)

2nd terminal
------------
Run
gdb

Then at the gdb prompt
target remote | vgdb -d

continue

hit ctrl-c

On 15-CURRENT I get the following output
^C18:27:14.880322 attach to 'main' pid 15004
18:27:14.880434 attach main pid PT_ATTACH pid 15004
18:27:14.880447 waitstopped attach main pid before waitpid signal_expected 17
18:27:14.880455 after waitpid pid 15004 p 15004 status 0x117f WIFSTOPPED 17 
18:27:14.880460 calling getregs
18:27:14.880465 getregs call succeeded
18:27:14.880469 push bad_return return address ptrace_write_memory
18:27:14.880472 Writing 0000000000000000 to 0x100288dd48
18:27:14.880489 calling setregs
18:27:14.880493 setregs succeeded
18:27:14.880495 PT_CONTINUE to invoke
18:27:14.880500 waitstopped waitpid status after PTRACE_CONT to invoke before
waitpid signal_expected 17

but no gdb prompt

On 14-2-RELEASE I get similar output but it continues to the gdb prompt:

Program received signal SIGTRAP, Trace/breakpoint trap.
_nanosleep () at _nanosleep.S:4
warning: 4      _nanosleep.S: No such file or directory
(gdb) 

Additionally on 14.2 (but not 15) I get on the Valgrind side in the 1st
terminal

--72512:1:  gdbsrv invoke_gdbserver running_tid 0 vgdb_interrupted_tid 1
--72512:1:  gdbsrv entering call_gdbserver vgdb_reason ... pid 72512 tid 1
status VgTs_WaitSys sched_jmpbuf_valid 1
--72512:1:  gdbsrv enter valgrind_wait pid 72512
--72512:1:  gdbsrv stop pc is 0x498553A
--72512:1:  gdbsrv exit valgrind_wait status T ptid id 107814 stop_pc
0x498553A: _nanosleep (in /lib/libc.so.7) signal 5
--72512:1:  gdbsrv Writing resume reply for 107814
--72512:1:  gdbsrv remove software_breakpoint at addr 0x400FCA0 0x400FCA0:
r_debug_state (in /libexec/ld-elf.so.1)

-- 
You are receiving this mail because:
You are the assignee for the bug.