Cleaned -up evidence about the PowerMac G5 multiprocessor boot hang ups with the modern VM_MAX_KERNEL_ADDRESS value [pcpup->pc_curpcb->pcb_sp sometimes fails]
Mark Millard
marklmi at yahoo.com
Thu Feb 21 21:01:30 UTC 2019
[A possible surprise is that the same pcpup->pc_curpcb value for
CPU 3 is present for hangs and for completing boots:
0xe0000000740cfac0 . Also: I've dropped historical text for this
note.]
Justin Hibbits pointed out that of course I'd not see the 0x20 "label":
I had stupidly placed it after a return statement without noticing.
See below.
void
pmap_cpu_bootstrap(int ap)
{
/*
* No KTR here because our console probably doesn't work yet
*/
return (MMU_CPU_BOOTSTRAP(mmu_obj, ap));
*(volatile unsigned long*)0xc0000000000000f0 = 0x20; // HACK!!!
powerpc_sync(); // HACK!!!
}
(The original void-return function has that return with an
expression. Not that such should have misdirected my thinking.)
So the expected/desired value to see after 0x25 would not be the
0x20 "label". (So I've now eliminated the lines that I had added.)
Thus the next value after the 0x25 from moea64_cpu_bootstrap_native
for a hang-up would have been 0x30 from cpudep_ap_bootstrap .
I've updated cpudep_ap_bootstrap to record more "labels" for
places reached:
uintptr_t
cpudep_ap_bootstrap(void)
{
register_t msr, sp;
*(volatile unsigned long*)0xc0000000000000f0 = 0x3F; // HACK!!!
powerpc_sync(); // HACK!!!
msr = psl_kernset & ~PSL_EE;
mtmsr(msr);
*(volatile unsigned long*)0xc0000000000000f0 = 0x31; // HACK!!!
powerpc_sync(); // HACK!!!
pcpup->pc_curthread = pcpup->pc_idlethread;
*(volatile unsigned long*)0xc0000000000000f0 = 0x32; // HACK!!!
powerpc_sync(); // HACK!!!
#ifdef __powerpc64__
__asm __volatile("mr 13,%0" :: "r"(pcpup->pc_curthread));
#else
__asm __volatile("mr 2,%0" :: "r"(pcpup->pc_curthread));
#endif
*(volatile unsigned long*)0xc0000000000000f0 = 0x33; // HACK!!!
powerpc_sync(); // HACK!!!
pcpup->pc_curpcb = pcpup->pc_curthread->td_pcb;
*(volatile unsigned long*)0xc0000000000000f0 = 0x34; // HACK!!!
powerpc_sync(); // HACK!!!
sp = pcpup->pc_curpcb->pcb_sp;
*(volatile unsigned long*)0xc0000000000000f0 = 0x30; // HACK!!!
powerpc_sync(); // HACK!!!
return (sp);
}
The result for hanging boots is "label": 0x34 is reported by CPU 0.
Thus it appears that pcpup->pc_curthread->td_pcb and (so) pcpup->pc_curpcb
end up with pointer value(s) that sometimes block:
pcpup->pc_curpcb->
from being used, although the pointer values need not be different.
(Later below they are shown to not be different for hangs vs. finishes).
Thus I added recording of the address in question:
pcpup->pc_curpcb = pcpup->pc_curthread->td_pcb;
*(volatile void**)0xc0000000000000e0 = pcpup->pc_curpcb; // HACK!!!
powerpc_sync(); // HACK!!!
*(volatile unsigned long*)0xc0000000000000f0 = 0x34; // HACK!!!
powerpc_sync(); // HACK!!!
sp = pcpup->pc_curpcb->pcb_sp;
and added reporting of the value placed at 0xc0000000000000e0 :
*rstvec = 4;
powerpc_sync();
(void)(*rstvec);
powerpc_sync();
DELAY(1);
*rstvec = 0;
powerpc_sync();
(void)(*rstvec);
powerpc_sync();
if (bootverbose) // HACK!!!
printf("After reset 4&0 for CPU %d, hwref=%jx, awake=%x, n_slbs=%jd,\n"
" *(volatile void**)0xc0000000000000e0=%p,\n"
" *(volatile unsigned long*)0xc0000000000000f0=0x%jx\n",
pc->pc_cpuid, (uintmax_t)pc->pc_hwref,
pc->pc_awake, (uintmax_t)n_slbs,
*(volatile void**)0xc0000000000000e0,
(uintmax_t)*(volatile unsigned long*)0xc0000000000000f0);
timeout = 10000;
while (!pc->pc_awake && timeout--)
DELAY(100);
if (bootverbose) // HACK!!!
printf("After attempted wait for awake CPU %d, hwref=%jx, awake=%x, n_slbs=%jd, delay 100 count = %jd,\n"
" *(volatile void**)0xc0000000000000e0=%p,\n"
" *(volatile unsigned long*)0xc0000000000000f0=0x%jx\n",
pc->pc_cpuid, (uintmax_t)pc->pc_hwref,
pc->pc_awake, (uintmax_t)n_slbs, (uintmax_t)(10000-timeout),
*(volatile void**)0xc0000000000000e0,
(uintmax_t)*(volatile unsigned long*)0xc0000000000000f0);
return ((pc->pc_awake) ? 0 : EBUSY);
The values of *(volatile void**)0xc0000000000000e0 (i.e., copies of
pcpup->pc_curpcb) after attempting to wait for pc->pc_awake for
CPU 3 are:
boots: 0x0xe0000000740cfac0
hangs: 0x0xe0000000740cfac0
So: no difference in value. So sometimes the address appears valid to
dereference on CPU 3 and other times the same address does not.
A successful boot looks like:
Adding CPU 0, hwref=cd38, awake=1
Waking up CPU 3 (dev=c480)
After reset 4&0 for CPU 3, hwref=c480, awake=0, n_slbs=64,
*(volatile void**)0xc0000000000000e0=0,
*(volatile unsigned long*)0xc0000000000000f0=0x25
After attempted wait for awake CPU 3, hwref=c480, awake=1, n_slbs=64, delay 100 count = 0,
*(volatile void**)0xc0000000000000e0=0xe0000000740cfac0,
*(volatile unsigned long*)0xc0000000000000f0=0x51
Adding CPU 3, hwref=c480, awake=1
Waking up CPU 2 (dev=c768)
After reset 4&0 for CPU 2, hwref=c768, awake=0, n_slbs=64,
*(volatile void**)0xc0000000000000e0=0xe0000000740cfac0,
*(volatile unsigned long*)0xc0000000000000f0=0x51
After attempted wait for awake CPU 2, hwref=c768, awake=1, n_slbs=64, delay 100 count = 0,
*(volatile void**)0xc0000000000000e0=0xe0000000740d8ac0,
*(volatile unsigned long*)0xc0000000000000f0=0x51
Adding CPU 2, hwref=c768, awake=1
Waking up CPU 1 (dev=ca50)
After reset 4&0 for CPU 1, hwref=ca50, awake=0, n_slbs=64,
*(volatile void**)0xc0000000000000e0=0xe0000000740d8ac0,
*(volatile unsigned long*)0xc0000000000000f0=0x51
After attempted wait for awake CPU 1, hwref=ca50, awake=1, n_slbs=64, delay 100 count = 0,
*(volatile void**)0xc0000000000000e0=0xe0000000740e1ac0,
*(volatile unsigned long*)0xc0000000000000f0=0x51
Adding CPU 1, hwref=ca50, awake=1
SMP: AP CPU #2 launched
SMP: AP CPU #3 launched
SMP: AP CPU #1 launched
A hanging boot looks like (from a picture):
Adding CPU 0, hwref=cd38, awake=1
Waking up CPU 3 (dev=c480)
After reset 4&0 for CPU 3, hwref=c480, awake=0, n_slbs=64,
*(volatile void**)0xc0000000000000e0=0,
*(volatile unsigned long*)0xc0000000000000f0=0x25
After attempted wait for awake CPU 3, hwref=c480, awake=1, n_slbs=64, delay 100 count = 0,
*(volatile void**)0xc0000000000000e0=0xe0000000740cfac0,
*(volatile unsigned long*)0xc0000000000000f0=0x34
Waking up CPU 2 (dev=c768)
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-ppc
mailing list