head -r347003 on 2-socket/2-cores-each G5 PowerMac11,2's: one type of boot-blocking context found
Mark Millard
marklmi at yahoo.com
Tue May 7 19:04:20 UTC 2019
On 2019-May-7, at 11:06, Justin Hibbits <chmeeedalf at gmail.com> wrote:
> On Mon, 6 May 2019 22:43:36 -0700
> Mark Millard <marklmi at yahoo.com> wrote:
>
>> Every example of boot failure during cpu_mp_unleash,
>> where I've had the tracking in place, has had 1 or more
>> examples of srr0<DMAP_BASE_ADDRESS (EXC_ISE) in
>> handle_kernel_slb_spill before cpu_mp_unleash tries to
>> start its first ap.
>>
>> Every example of boot success, where I've had the tracking
>> in place, has had no examples of srr0<DMAP_BASE_ADDRESS
>> (EXC_ISE) in handle_kernel_slb_spill before the
>> cpu_mp_unleash finished. (Successful boots are rare
>> in my current test context, so there are fewer examples
>> of this.)
>>
>> In other words: the original live-G5 information
>> for the segment was still present throughout that
>> time frame, thus avoiding a slbtrap for such a
>> fetch address over the time frame involved.
>>
>>
>>
>> In the the code:
>>
>> rstvec = rstvec_virtbase + reset;
>> printf("powermac_smp_start_cpu: about to use *rstvec==4\n");
>> *rstvec = 4;
>> powerpc_sync();
>> (void)(*rstvec);
>> powerpc_sync();
>> DELAY(1);
>> printf("powermac_smp_start_cpu: about to use *rstvec==0\n");
>> *rstvec = 0;
>> powerpc_sync();
>> (void)(*rstvec);
>> powerpc_sync();
>> printf("powermac_smp_start_cpu: done using *rstvec==0\n");
>>
>> Every boot failure has had the last line reported by
>> FireWire dcons use as the first of those 3 printf's,
>> for CPU 2 as the target (of 0-3).
>>
>> The above code appears to me to execute with MSR.IR=1
>> on the bsp.
>>
>> But, then, what would *rstvec do if there is no ESID=0
>> V=1 combination active for the live-G5 information at
>> the time? Does that block the exception code that
>> is in what would be ESID=0's address range, effectively
>> preventing slbtrap from being invoked to enable ESID=0?
>>
>> In other words: when MSR.IR=1, does there always
>> need to be a ESID=0 V=1 entry? Is it appropriate
>> to reserve one for ESID=0 V=1 (after invalidating
>> any arbitrarily placed ESID=0 V=1 entry present
>> before the kernel even started)?
>
> Hi Mark,
>
> Thanks for continuing to look into this. In this case you're
> presenting, a ISE shouldn't really matter, because the SLB miss handler
> is written to run entirely from real mode to handle the miss. Can you
> determine what the addresses were that faulted in the failure cases?
> We shouldn't be touching anything below DMAP_BASE at this time, since
> we're not yet in userspace, and all mappings should be either KVA or
> DMAP.
I'll try to to get examples of all of them for based on
my current code code.
But in a earlier message I reported several examples from
simply sticking a printf in handle_kernel_sb_spill and
later making it controllable to report at selective time
frames. (The printf's being there lead to earlier hang-ups.
I was surprised I got anything.)
Remember that the number of handle_kernel_sb_spill
calls for srr0<DMAP_START and dar<DMAP_START varies
from boot to boot so the places are not unique unique
overall.
Here is the core of those old reports for reference:
KDB: debugger backends: ddb
KDB: current backend: ddb
handle_kernel_slb_spill: type=0x380 dar=0x3d99348 srr0=0xa869bc
handle_kernel_slb_spill: type=0x380 dar=0x10000000 srr0=0xa869bc
Both seemed to involve the stbx instruction in:
0000000000a869bc <.memset+0x20> stbx r4,r9,r3
0000000000a869c0 <.memset+0x24> addi r9,r9,1
0000000000a869c4 <.memset+0x28> bdnz 0000000000a869bc <.memset+0x20>
The above was from the unconditional printf addition and, as I
remember, repeated for:
#ifdef __powerpc64__
i = 0;
for (va = virtual_avail; va < virtual_end && i<(n_slbs-1)/2; va += SEGMENT_LENGTH, i++)
moea64_bootstrap_slb_prefault(va, 0);
#endif
enable_handle_kernel_slb_spill_reporting= 1;
(Note the (n_slbs-1)/2 that I was experimenting with at
the time.)
The below was from instead enabling later:
enable_handle_kernel_slb_spill_reporting= 1;
dpcpu_init(dpcpu, curcpu);
got (eliminating an unrelated line that had a
truncated address showing):
KDB: debugger backends: ddb
KDB: current backend: ddb
handle_kernel_slb_spill: type=0x380 dar=0x22ef8 srr0=0xa86690
handle_kernel_slb_spill: type=0x480 dar=0x22ef8 srr0=0xa86690
Both seemed to involve the stdu instruction in:
0000000000a8668c <.memcpy+0x140> ldu r0,-8(r9)
0000000000a86690 <.memcpy+0x144> stdu r0,-8(r11)
0000000000a86694 <.memcpy+0x148> bdnz 0000000000a8668c <.memcpy+0x140>
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-ppc
mailing list