[PATCH] Fix kernel panics when using dtrace fbt return probes on i386

Ryan Stone rysto32 at gmail.com
Sat Nov 5 22:31:07 UTC 2011

I have a patch that fixes crash when using dtrace fbt return probes on
i386.  dtrace implements an fbt probe by overwriting a small part of
the function when the probe is active.  On i386, it writes an invalid
opcode.  dtrace has a hook into the invalid opcode fault handler that
checks whether the fault was due to an active probe; if it was, the
probe fires and then the fault handler emulates the behaviour of the
instruction that was replaced with the invalid opcode and returns
control to the instruction after the invalid opcode.

The bug is in the emulation of the leave instruction(which is replaced
for an fbt return probe).  The is a small window in which the
emulation leaves critical state above the current stack pointer.  If
the CPU takes an interrupt in this window, the trap frame corrupts
this state.

The leave instruction is used to pop off a stack frame immediately
prior to returning from a function.  Because the leave emulation is
running in a fault handler, it must copy the trap frame to the bottom
of the stack frame that is being removed, set its stack pointer to the
start of the new trap frame and then execute the iret instruction.
There are actually two bugs in the current implementation.  The first
is that immediately before restoring the stack pointer to point to the
new trap frame, the emulation must save the new stack pointer on the
stack, restore the values of its scratch registers, and then load the
new stack pointer back from the stack.  The current implementation
saves the new stack pointer at -4(%esp).  If an interrupt is taking
after the new stack pointer is saved at this location the trap frame
will overwrite the new stack pointer.  The fix for this is to instead
save the new stack pointer at 8(%esp), which was part of the old trap
frame that was copied forward.  This is safe as we know that the trap
frame must exist (so 8(%esp) can't possibly point to still-valid data
from the next stack frame, for instances) and at this point we have
copied the stack frame forward so we can safely overwrite the old one
without any issues.

The second bug is that when the new stack pointer is restored, it does
not point at the new trap frame.  Instead it points 8 bytes into the
trap frame.  The emulation code corrects for this by subtracting 8
from %esp before executing the iret.  However if we take an interrupt
after we have restored the new stack pointer, but before subtracting 8
from %esp, the trap frame from the interrupt will overwrite the first
8 bytes of the invalid opcode trap frame, causing us to crash when we
execute the iret.  The fix is to adjust the new stack pointer before
saving onto the stack in the first place, so that we we restore it it
is immediately valid, closing the window in which an interrupt can
corrupt anything.

If there are no objections to this patch I will commit it soonish.  I
would appreciate some review, but IMO this really needs to get in for
9.0 as a central tenet of dtrace is that it will not crash your

Index: sys/cddl/dev/dtrace/i386/dtrace_asm.S
--- sys/cddl/dev/dtrace/i386/dtrace_asm.S       (revision 227094)
+++ sys/cddl/dev/dtrace/i386/dtrace_asm.S       (working copy)
@@ -125,11 +125,11 @@
        movl    8(%esp), %eax           /* load calling EIP */
        incl    %eax                    /* increment over LOCK prefix */
        movl    %eax, -8(%ebx)          /* store calling EIP */
-       movl    %ebx, -4(%esp)          /* temporarily store new %esp */
+       subl    $8, %ebx                /* adjust for three pushes, one pop */
+       movl    %ebx, 8(%esp)           /* temporarily store new %esp */
        popl    %ebx                    /* pop off temp */
        popl    %eax                    /* pop off temp */
-       movl    -12(%esp), %esp         /* set stack pointer */
-       subl    $8, %esp                /* adjust for three pushes, one pop */
+       movl    (%esp), %esp            /* set stack pointer */
        iret                            /* return from interrupt */

More information about the freebsd-current mailing list