Cleaned -up evidence about the PowerMac G5 multiprocessor boot hang ups with the modern VM_MAX_KERNEL_ADDRESS value [pcpup->pc_curpcb->pcb_sp sometimes fails]

Mark Millard marklmi at yahoo.com
Thu Feb 21 21:01:30 UTC 2019


[A possible surprise is that the same pcpup->pc_curpcb value for
CPU 3 is present for hangs and for completing boots:
0xe0000000740cfac0 . Also: I've dropped historical text for this
note.]

Justin Hibbits pointed out that of course I'd not see the 0x20 "label":
I had stupidly placed it after a return statement without noticing.
See below.

void
pmap_cpu_bootstrap(int ap)
{
        /*
         * No KTR here because our console probably doesn't work yet
         */

        return (MMU_CPU_BOOTSTRAP(mmu_obj, ap));
 
        *(volatile unsigned long*)0xc0000000000000f0 = 0x20; // HACK!!!
        powerpc_sync(); // HACK!!!
}

(The original void-return function has that return with an
expression. Not that such should have misdirected my thinking.)

So the expected/desired value to see after 0x25 would not be the
0x20 "label". (So I've now eliminated the lines that I had added.)

Thus the next value after the 0x25 from moea64_cpu_bootstrap_native
for a hang-up would have been 0x30 from cpudep_ap_bootstrap .

I've updated cpudep_ap_bootstrap to record more "labels" for
places reached:

uintptr_t
cpudep_ap_bootstrap(void)
{
        register_t msr, sp;

        *(volatile unsigned long*)0xc0000000000000f0 = 0x3F; // HACK!!!
        powerpc_sync(); // HACK!!!

        msr = psl_kernset & ~PSL_EE;
        mtmsr(msr);

        *(volatile unsigned long*)0xc0000000000000f0 = 0x31; // HACK!!!
        powerpc_sync(); // HACK!!!

        pcpup->pc_curthread = pcpup->pc_idlethread;

        *(volatile unsigned long*)0xc0000000000000f0 = 0x32; // HACK!!!
        powerpc_sync(); // HACK!!!

#ifdef __powerpc64__
        __asm __volatile("mr 13,%0" :: "r"(pcpup->pc_curthread));
#else
        __asm __volatile("mr 2,%0" :: "r"(pcpup->pc_curthread));
#endif

        *(volatile unsigned long*)0xc0000000000000f0 = 0x33; // HACK!!!
        powerpc_sync(); // HACK!!!

        pcpup->pc_curpcb = pcpup->pc_curthread->td_pcb;
        
        *(volatile unsigned long*)0xc0000000000000f0 = 0x34; // HACK!!!
        powerpc_sync(); // HACK!!!
        
        sp = pcpup->pc_curpcb->pcb_sp;

        *(volatile unsigned long*)0xc0000000000000f0 = 0x30; // HACK!!!
        powerpc_sync(); // HACK!!!

        return (sp);
}

The result for hanging boots is "label": 0x34 is reported by CPU 0.

Thus it appears that pcpup->pc_curthread->td_pcb and (so) pcpup->pc_curpcb
end up with pointer value(s) that sometimes block:

pcpup->pc_curpcb->

from being used, although the pointer values need not be different.
(Later below they are shown to not be different for hangs vs. finishes).

Thus I added recording of the address in question:

        pcpup->pc_curpcb = pcpup->pc_curthread->td_pcb;

        *(volatile void**)0xc0000000000000e0 = pcpup->pc_curpcb; // HACK!!!
        powerpc_sync(); // HACK!!!
        *(volatile unsigned long*)0xc0000000000000f0 = 0x34; // HACK!!!
        powerpc_sync(); // HACK!!!

        sp = pcpup->pc_curpcb->pcb_sp;

and added reporting of the value placed at 0xc0000000000000e0 :

        *rstvec = 4;
        powerpc_sync();
        (void)(*rstvec);
        powerpc_sync();
        DELAY(1);
        *rstvec = 0;
        powerpc_sync();
        (void)(*rstvec);
        powerpc_sync();

        if (bootverbose) // HACK!!!
                printf("After reset 4&0 for CPU %d, hwref=%jx, awake=%x, n_slbs=%jd,\n"
                       " *(volatile void**)0xc0000000000000e0=%p,\n"
                       " *(volatile unsigned long*)0xc0000000000000f0=0x%jx\n",
                    pc->pc_cpuid, (uintmax_t)pc->pc_hwref,
                    pc->pc_awake, (uintmax_t)n_slbs,
                    *(volatile void**)0xc0000000000000e0,
                    (uintmax_t)*(volatile unsigned long*)0xc0000000000000f0);

        timeout = 10000;
        while (!pc->pc_awake && timeout--)
                DELAY(100);

        if (bootverbose) // HACK!!!
                printf("After attempted wait for awake CPU %d, hwref=%jx, awake=%x, n_slbs=%jd, delay 100 count = %jd,\n"
                       " *(volatile void**)0xc0000000000000e0=%p,\n"
                       " *(volatile unsigned long*)0xc0000000000000f0=0x%jx\n",
                    pc->pc_cpuid, (uintmax_t)pc->pc_hwref,
                    pc->pc_awake, (uintmax_t)n_slbs, (uintmax_t)(10000-timeout),
                    *(volatile void**)0xc0000000000000e0,
                    (uintmax_t)*(volatile unsigned long*)0xc0000000000000f0);

        return ((pc->pc_awake) ? 0 : EBUSY);

The values of *(volatile void**)0xc0000000000000e0 (i.e., copies of
pcpup->pc_curpcb) after attempting to wait for pc->pc_awake for
CPU 3 are:

boots: 0x0xe0000000740cfac0
hangs: 0x0xe0000000740cfac0

So: no difference in value. So sometimes the address appears valid to
dereference on CPU 3 and other times the same address does not.

A successful boot looks like:

Adding CPU 0, hwref=cd38, awake=1
Waking up CPU 3 (dev=c480)
After reset 4&0 for CPU 3, hwref=c480, awake=0, n_slbs=64,
 *(volatile void**)0xc0000000000000e0=0,
 *(volatile unsigned long*)0xc0000000000000f0=0x25
After attempted wait for awake CPU 3, hwref=c480, awake=1, n_slbs=64, delay 100 count = 0,
 *(volatile void**)0xc0000000000000e0=0xe0000000740cfac0,
 *(volatile unsigned long*)0xc0000000000000f0=0x51
Adding CPU 3, hwref=c480, awake=1
Waking up CPU 2 (dev=c768)
After reset 4&0 for CPU 2, hwref=c768, awake=0, n_slbs=64,
 *(volatile void**)0xc0000000000000e0=0xe0000000740cfac0,
 *(volatile unsigned long*)0xc0000000000000f0=0x51
After attempted wait for awake CPU 2, hwref=c768, awake=1, n_slbs=64, delay 100 count = 0,
 *(volatile void**)0xc0000000000000e0=0xe0000000740d8ac0,
 *(volatile unsigned long*)0xc0000000000000f0=0x51
Adding CPU 2, hwref=c768, awake=1
Waking up CPU 1 (dev=ca50)
After reset 4&0 for CPU 1, hwref=ca50, awake=0, n_slbs=64,
 *(volatile void**)0xc0000000000000e0=0xe0000000740d8ac0,
 *(volatile unsigned long*)0xc0000000000000f0=0x51
After attempted wait for awake CPU 1, hwref=ca50, awake=1, n_slbs=64, delay 100 count = 0,
 *(volatile void**)0xc0000000000000e0=0xe0000000740e1ac0,
 *(volatile unsigned long*)0xc0000000000000f0=0x51
Adding CPU 1, hwref=ca50, awake=1
SMP: AP CPU #2 launched
SMP: AP CPU #3 launched
SMP: AP CPU #1 launched

A hanging boot looks like (from a picture):

Adding CPU 0, hwref=cd38, awake=1
Waking up CPU 3 (dev=c480)
After reset 4&0 for CPU 3, hwref=c480, awake=0, n_slbs=64,
 *(volatile void**)0xc0000000000000e0=0,
 *(volatile unsigned long*)0xc0000000000000f0=0x25
After attempted wait for awake CPU 3, hwref=c480, awake=1, n_slbs=64, delay 100 count = 0,
 *(volatile void**)0xc0000000000000e0=0xe0000000740cfac0,
 *(volatile unsigned long*)0xc0000000000000f0=0x34
Waking up CPU 2 (dev=c768)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-ppc mailing list