A potential fix for arm64's: sh`forkshell child-process path after fork sometimes has a bad stack pointer value

Mark Millard markmi at dsl-only.net
Sun Mar 19 19:40:26 UTC 2017


On 2017-Mar-19, at 7:18 AM, Otacílio <otacilio.neto at bsd.com.br> wrote:

> Em 14/02/2017 13:35, Mark Millard escreveu:
>> The following change has let my test run for 8.5 hours so far without a
>> fork-failure in sh`forkshell :
>> 
>> # svnlite diff /usr/src/sys/arm64/arm64/swtch.S
>> Index: /usr/src/sys/arm64/arm64/swtch.S
>> ===================================================================
>> --- /usr/src/sys/arm64/arm64/swtch.S    (revision 312982)
>> +++ /usr/src/sys/arm64/arm64/swtch.S    (working copy)
>> @@ -241,6 +241,12 @@
>>         mov     fp, #0  /* Stack traceback stops here. */
>>         bl      _C_LABEL(fork_exit)
>>  +       /*
>> +        * Disable interrupts to avoid
>> +        * overwriting sp_el0 and spsr_el1 by an IRQ exception.
>> +        */
>> +       msr     daifset, #2
>> +
>>         /* Restore sp and lr */
>>         ldp     x0, x1, [sp]
>>         msr     sp_el0, x0
>> @@ -263,12 +269,6 @@
>>         ldp     x28, x29, [sp, #TF_X + 28 * 8]
>>         /* Skip x30 as it was restored above as lr */
>>  -       /*
>> -        * Disable interrupts to avoid
>> -        * overwriting spsr_el1 by an IRQ exception.
>> -        */
>> -       msr     daifset, #2
>> -
>>         /* Restore elr and spsr */
>>         ldp     x0, x1, [sp, #16]
>>         msr     elr_el1, x0
>> 
>> I'm going to switch to attempting a self-hosted buildworld
>> buildkernel again.
> 
> This patch or some other about this bug was committed to HEAD?

Yes, "some other" in -r313772 (2017-Feb-15). See:

https://lists.freebsd.org/pipermail/svn-src-head/2017-February/097004.html

which in part says:

Author: andrew
Date: Wed Feb 15 14:56:47 2017
New Revision: 313772
URL: 
https://svnweb.freebsd.org/changeset/base/313772


Log:
  Load the new sp_el0 with interrupts disabled in fork_trampoline. If an
  interrupt arrives in fork_trampoline after sp_el0 was written we may then
  switch to a new thread, enter userland so change this stack pointer, then
  return to this code with the wrong value. This fixes this case by moving
  the load of sp_el0 until after interrupts have been disabled.
  
  Reported by:	Mark Millard (markmi at dsl-only.net)
  Sponsored by:	ABT Systems Ltd
  Differential Revision:	https://reviews.freebsd.org/D9593


Modified:
  head/sys/arm64/arm64/swtch.S


===
Mark Millard
markmi at dsl-only.net




More information about the freebsd-arm mailing list