[patch] fasttrap process scratch space

Mon Feb 24 04:22:51 UTC 2014

On Sun, Feb 23, 2014 at 11:14:54PM -0500, Mark Johnston wrote:
> Hello,
> 
> For those not familiar with MD parts of fasttrap, one of the things it
> has to do is ensure that any userland instruction that it replaces with
> a breakpoint gets executed in the traced process' context. For several
> common classes of instructions, fasttrap will emulate the instruction in
> the breakpoint handler; when it can't do that, it copies the instruction
> out to some scratch space in the process' address space and sets the PC
> of the interrupted thread to the address of that instruction, which is
> followed by a jump to the instruction following the breakpoint. There's
> a helpful block comment titled "Generic Instruction Tracing" around line
> 1585 of the x86 fasttrap_isa.c which describes the details of this.
> 
> This functionality currently doesn't work on FreeBSD, mainly because we
> don't necessarily have any (per-thread) scratch space available for use
> in the process' address space. In illumos/Solaris, a small (< 64 byte)
> block is reserved in each thread's TLS for use by DTrace. It turns out
> that doing the same thing on FreeBSD is quite easy:
> 
> http://people.freebsd.org/~markj/patches/fasttrap_scratch_hacky.diff
> 
> Specifically, we need to ensure that TLS (allocated by the runtime
> linker) is executable and that we properly extract the offset to the
> scratch space from the FS segment register. I think this is somewhat
> hacky though, as it creates a dependency on libthr and rtld internals.
> 
> A second approach is to have fasttrap dynamically allocate scratch space
> within the process' address space using vm_map_insert(9). My
> understanding is that Apple's DTrace implementation does this, and I've
> implemented this approach for FreeBSD here (which was done without
> referencing Apple code):
> 
> http://people.freebsd.org/~markj/patches/fasttrap-scratch-space/fasttrap-scratch-space-1.diff
> 
> The idea is to map pages of executable memory into the user process as
> needed, and carve them into scratch space chunks for use by individual
> threads. If a thread in fasttrap_pid_probe() needs scratch space, it
> calls a new function, fasttrap_scraddr(). If the thread already has
> scratch space allocated to it, it's used. Otherwise, if any free scratch
> space chunks are available in an already-mapped page, one of them is
> allocated to the thread and used. Otherwise, a new page is mapped using
> vm_map_insert(9).
> 
> Threads hold onto their scratch space until they exit. That is, scratch
> space is never unmapped from the process, even if the controlling
> dtrace(1) process detaches. I added a handler for thread_dtor event
> which re-adds any scratch space held by the thread to the free list for
> that process. Per-process scratch space state is held in the fasttrap
> process handle (fasttrap_proc_t), since that turns out to be much easier
> than keeping it in the struct proc.
> 
> Does anyone have any thoughts or comments on the approach or the patch? 
> Any review or testing would be very much appreciated.
> 
> For testing purposes, it's helpful to know that tracing memcpy() on
> amd64 will result in use of this scratch space code, as it starts with a
> "mov %rdi,%rax" on my machine at least. My main test case has been to
> run something like
> 
> # dtrace -n 'pid$target:libc.so.7::entry {@[probefunc] = count()}' -p $(pgrep firefox)
> 
> Attempting to trace all functions still results in firefox dying with
> SIGTRAP, but we're getting there. :)

I should probably add that the diff described here should also be
applied when testing:

http://lists.freebsd.org/pipermail/freebsd-dtrace/2014-February/000175.html

Otherwise it's quite easy to trigger deadlocks.

-Mark