misc/177624: Swapcontext can get compiled incorrectly

Thu Apr 4 19:40:03 UTC 2013

The following reply was made to PR misc/177624; it has been noted by GNATS.

From: Bruce Evans <brde at optusnet.com.au>
To: Brian Demsky <bdemsky at uci.edu>
Cc: Bruce Evans <brde at optusnet.com.au>, freebsd-bugs at FreeBSD.org,
        freebsd-gnats-submit at FreeBSD.org
Subject: Re: misc/177624: Swapcontext can get compiled incorrectly
Date: Fri, 5 Apr 2013 06:38:51 +1100 (EST)

   This message is in MIME format.  The first part should be readable text,
   while the remaining parts are likely unreadable without MIME-aware tools.

 --0-627049122-1365104331=:2557
 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
 Content-Transfer-Encoding: QUOTED-PRINTABLE

 On Thu, 4 Apr 2013, Brian Demsky wrote:

 > On Apr 4, 2013, at 8:16 AM, Bruce Evans wrote:
 >
 >> On Fri, 5 Apr 2013, Bruce Evans wrote:
 >>
 >>> On Thu, 4 Apr 2013, Brian Demsky wrote:
 >>>
 >>>>> Description:
 >>>> Here is the code for swap context:
 >>>> int
 >>>> swapcontext(ucontext_t *oucp, const ucontext_t *ucp)
 >>>> {
 >>>>     int ret;
 >>>>     if ((oucp =3D=3D NULL) || (ucp =3D=3D NULL)) {
 >>>>             errno =3D EINVAL;
 >>>>             return (-1);
 >>>>     }
 >>>>     oucp->uc_flags &=3D ~UCF_SWAPPED;
 >>>>     ret =3D getcontext(oucp);
 >>>>     if ((ret =3D=3D 0) && !(oucp->uc_flags & UCF_SWAPPED)) {
 >>>>             oucp->uc_flags |=3D UCF_SWAPPED;
 >>>>             ret =3D setcontext(ucp);
 >>>>     }
 >>>>     return (ret);
 >>>> }
 >>>
 >>>> On the OS X port of libc in Mac OSX 10.7.5, this gets compiled as:
 >>>
 >>>> ...
 >>>> 0x00007fff901e870b <swapcontext+89>:    pop    %rbx
 >>>> 0x00007fff901e870c <swapcontext+90>:    pop    %r14
 >>>> 0x00007fff901e870e <swapcontext+92>:    jmpq   0x7fff90262855 <setcont=
 ext>
 >>>> The problem is that rbx is callee saved by compiled version of swapcon=
 text and then reused before getcontext is called.  Getcontext then stores t=
 he wrong value for rbx and setcontext later restores the wrong value for rb=
 x. If the caller had any value in rbx, it has been trashed at this point.
 >>>
 >>> Later you wrote:
 >>>
 >>>> The analysis is a little wrong about the problem.  Ultimately, the tai=
 l call to set context trashes the copies of bx and r14 on the stack=85.
 >>>
 >>> The bug seems to be in setcontext().  It must preserve the callee-saved
 >>> registers, not restore them.  This would happen automatically if more
 >>> were written in C.  But setcontext() can't be written entirely in C,
 >>> since it must save all callee-saved registers including ones not used
 >>> and therefore not normally saved by any C function that it might be in,
 >>> and possibly also including callee-saved registers for nonstandard or
 >>> non-C ABIs.  In FreeBSD, it is apparently always a syscall.
 >>
 >> This is more than a little wrong.  When setcontext() succeeds, it
 >> doesn't return here.  Then it acts like longjmp() and must restore all
 >> the callee-saved to whatever they were when getcontext() was called.
 >> Otherwise, it must not clobber any callee-saved registers (then it
 >> differs from longjmp().  longjmp() just can't fail).
 >>
 >> Now I don't see any bug here.  If the saved state is returned to, then
 >> it is as if getcontext() returned, and the intermediately-saved %rbx
 >> is correct (we will restore the orginal %rbx if we return).  If
 >> setcontext() fails, then it should preserve all callee-saved registers.
 >> In the tail-call case, we have already restored the orginal %rbx and
 >> the failing setcontext() should preserve that.
 >>
 >> Bruce
 >
 > Take at setcontext:
 >
 > (gdb) disassemble setcontext
 > Dump of assembler code for function setcontext:
 > 0x00007fff90262855 <setcontext+0>:      push   %rbx
 > 0x00007fff90262856 <setcontext+1>:      lea    0x38(%rdi),%rbx
 > 0x00007fff9026285a <setcontext+5>:      cmp    0x30(%rdi),%rbx
 > 0x00007fff9026285e <setcontext+9>:      je     0x7fff90262864 <setcontext=
 +15>
 > 0x00007fff90262860 <setcontext+11>:     mov    %rbx,0x30(%rdi)
 > 0x00007fff90262864 <setcontext+15>:     mov    0x4(%rdi),%edi
 > 0x00007fff90262867 <setcontext+18>:     callq  0x7fff90262998 <sigsetmask=
 >
 > 0x00007fff9026286c <setcontext+23>:     mov    %rbx,%rdi
 > 0x00007fff9026286f <setcontext+26>:     pop    %rbx
 > 0x00007fff90262870 <setcontext+27>:     jmpq   0x7fff90262875 <_setcontex=
 t>
 > End of assembler dump.
 >
 > The stack from swapcontext had rbx and r14 popped after getcontext stored=
  everything.  Now we push rbx and then later call sigsetmask.  Those two ac=
 tions guarantee that the memory locations where rbx and r14 were on the sta=
 ck have been overwritten.  When we later return to the save context, it wil=
 l start up swapcontext and pop the wrong values off of the stack for rbx an=
 d r14.

 Ah, it is not really rbx and r14, but rsp and the whole stack frame of
 swapcontext() that are mishandled.  Even returning from swapcontext()
 leaves the saved rsp pointing to garbage.  The stack frame could have
 had almost anything on it before it became invalid, but here it has mainly
 the saved rbx and r14 (not rbp; however, when compiled by clang on FreeBSD,
 it also has the saved rbp, and when compiled with -O0 it also has the
 local variable).

 Now I think swapcontext() can't be written in C, for the same reason that
 setjmp() can't be written in C -- the stack frame cannot be controlled in
 C, and if it has anything at all on it (even the return address requires
 special handling), then the stack pointer saved in the context becomes
 invalid when the function returns, or even earlier for tail calls and
 other optimizations.  Also, if the C function calls another function like
 the library getcontext(), then there are 2 stack frames below the saved
 stack pointer that are hard to control.  In FreeBSD on at least x86,
 getcontext() is a special non-automatically generated asm function to
 control this.  The automatically generated asm function would have
 only the return address in its stack frame, but even this is too much,
 and there is even more to control.  The comment about this is incomplete/
 partly wrong, but gives a good hint about the problem (*).

 This problem is avoided in setjmp() and longjmp() by not leaving anything
 on stack frames except the return address for setjmp() at the point where
 the stack pointer is saved.  The saved stack pointer is still technically
 invalid, since it points to the return address which goes away when
 setjmp() returns.  This is handled by not returning in the usual way in
 lonjmp().  Instead, setjmp() saves the return address and longjmp()
 restores it; the restored stack pointer becomes valid only at the end
 of setjmp() when the top word on it is restored.  FreeBSD getcontext()
 uses a more hackish method (*).

 The saved stack pointer becomes more than technically invalid if the
 function that called setjmp() or swapcontext() returns.  Then the
 caller's frame becomes invalid.  Such returns are invalid.  This
 restriction is clearly documented for setjmp() but not for swapcontext()
 in FreeBSD man pages and old C99 and POSIX specs.  The C99 restriction
 is only that longjmp() must not be invoked with the saved state after
 the function that saved the state using setjmp() returns.  Compilers
 must know about this and and not do optimizations (like tail calls?)
 that would invalidate the saved stack pointer before the function
 returns.

 (*) Here is the FreeBSD i386 getcontext():

 @ /*
 @  * This has to be magic to handle the multiple returns.

 Multiple =3D just 2.

 @  * Otherwise, the setcontext() syscall will return here and we'll
 @  * pop off the return address and go to the *setcontext* call.
 @  */

 Actually, we would pop off garbage and go to neverland, with a lower
 chance of going to setcontext than most places.

 @ =09.weak=09_getcontext
 @ =09.set=09_getcontext,__sys_getcontext
 @ =09.weak=09getcontext
 @ =09.set=09getcontext,__sys_getcontext
 @ ENTRY(__sys_getcontext)
 @ =09movl=09(%esp),%ecx=09/* save getcontext return address */
 @ =09mov=09$SYS_getcontext,%eax
 @ =09KERNCALL
 @ =09jb=09HIDENAME(cerror)

 When getcontext() fails, the return address on the stack is still valid
 and cerror depends on that.

 @ =09addl=09$4,%esp=09=09/* remove stale (setcontext) return address */

 Actually, remove the non-stale (our getcontext) return address if the
 return is not from setcontext, and remove stack garbage if the
 return is from setcontext.  The stack contents is not related to
 setcontext in either case.  In the garbage case, it started as the
 return address for another getcontext, but became garbage when that
 returned.

 @ =09jmp=09*%ecx=09=09/* restore return address */

 We want our return address in both cases.  We don't know which case
 applies and use same code for both.  The comment is imprecise.  We
 don't restore the return address.  What we do is return.

 The code should be changed to match the comment (don't adjust %esp,
 but actually restore the return address to it).  This method is used
 without comment by FreeBSD x86 longjmp()).  Optimizing this for speed
 is unimportant, but this is probably faster as well as cleaner.  The
 jmp may have been faster 20 years ago, but now it unbalances call/return
 branch prediction.

 The libc swapcontext() can probably be fixed by copying
 setjmp()/longjmp().  Except setjmp()/longjmp() has fundamentally broken
 stack handling too, at least when setjmp() is actually sigsetjmp().
 Then it is necessary to restore the stack pointer atomically with
 restoring the signal mask, and this seems to be impossible with using
 a single syscall that does both.  The 2 different orders of restoring
 them give different problems.  In the above, you seem to have
 setcontext() in userland, with some signal unmasking, so I think it
 has the same races as setjmp()/longjmp().  Perhaps an atomic syscall
 for setcontext() is enough.

 Bruce
 --0-627049122-1365104331=:2557--