pine64 (an A64 Contex-A53 context, e.g. -r312982): sh`forkshell child-process path after fork sometimes has a bad stack pointer value

Sun Feb 12 11:09:10 UTC 2017

On pine64 (an A64 Cortex-A53 context) multiple people on the lists
including me have reported sh getting occasional core dumps.

I've analyzed a bunch of the sh core dumps and all failed in the
child-process path of forkshell when forkshell tried to return. 
I've since done experiments with code to detect some forms of
odd stack pointer values so that the adjusted code calls abort
for such a detection before such a return would happen. [This
gives a nicer context to look at in core dumps (before things
are very messed up if the sp is bad).]

In sh`forkshell, just after the fork returns, on the child-process
path there is sometimes a messed up sp value by what direction
it is from the prior frame-pointers on the stack --and on occasion
the value difference is very large, such as:
(from: lldb register read on the frame with the pc in sh`forkshell )

        fp = 0x0000ffffffffce90
        sp = 0x0000ffffffffe980

This has the sp with a larger address than what sh`__start
stored as the frame-pointer back-link when it is put to use via
ld-elf.so.1`.rtld_start (more like 0x0000ffffffffde10 as I
remember): outside the active stack region.

[Note: my experiments so far would not establish if the sp
might sometimes have an unexpectedly large distance toward
lower memory addresses, specially if it was still in the
potential stack-region. It may be that both directions
happen.]

The distance when it fails is vary variable across examples.
I just picked an example were stack frames would be written
over the top of other material when sh`forkshell makes other
calls on the child-process path, material that would be
outside what should be the active stack region.

# uname -apKU
FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT  r312982M  arm64 aarch64 1200020 1200020

(I've frozen at that version for this exploration.
It has taken me a while.)

Looking around I see what might be a few possibilities. . .
(I'm no expert so some might be trivially eliminated.)

Possibility #0 (possibilities in no particular order):

sys/arm64/arm64/vm_machdep.c :

In cpu_fork what if the bcopy of td1-td_frame might not
always have access to the latest updated values, needing
some form of memory "fence" to be sure that such values are
accessible? :

        tf = (struct trapframe *)STACKALIGN((struct trapframe *)pcb2 - 1);
        bcopy(td1->td_frame, tf, sizeof(*tf));
        tf->tf_x[0] = 0;
        tf->tf_x[1] = 0;
        tf->tf_spsr = 0;

        td2->td_frame = tf;

        /* Set the return value registers for fork() */
        td2->td_pcb->pcb_x[8] = (uintptr_t)fork_return;
        td2->td_pcb->pcb_x[9] = (uintptr_t)td2;
        td2->td_pcb->pcb_x[PCB_LR] = (uintptr_t)fork_trampoline;
        td2->td_pcb->pcb_sp = (uintptr_t)td2->td_frame;
        td2->td_pcb->pcb_fpusaved = &td2->td_pcb->pcb_fpustate;
        td2->td_pcb->pcb_vfpcpu = UINT_MAX;

        /* Setup to release spin count in fork_exit(). */
        td2->td_md.md_spinlock_count = 1;
        td2->td_md.md_saved_daif = 0;

Possibility #1:

sys/arm64/arm64/swtch.S :

ENTRY(fork_trampoline)
. . .
        /* Restore sp and lr */
        ldp     x0, x1, [sp]
        msr     sp_el0, x0
        mov     lr, x1

Similar point to #0 but for the ldp memory accesses
shown.

Possibility #3:

sys/arm64/arm64/exception.S :

Both of:

handle_el0_sync
handle_el0_irq

also update sp_el0 and so if any such can happen
during any part of fork_trampoline after its
"msr sp_el0, x0" but before its "msr daifset, #2"
(disabling interrupts), then the wrong sp_el0 value
would be in place at fork_tramploine's eret .

It will be interesting to see what the problem actually
was once it has been fixed.

===
Mark Millard
markmi at dsl-only.net