A little new before-Copyright-notice/ofwcall crash information... [Still no solution, just more information]
Mark Millard
markmi at dsl-only.net
Fri Oct 10 21:20:30 UTC 2014
I was experimenting with trying to get more information on the "before Copyright notice"/ofwcall PowerMac G5 hangs and accidentally got better information than I expected. (At least if the "show registers" is to be believed for SRR0.)
First I'll give the results and what they refer to. Then how I got them.
As part of the experiments I stuck in isync commands after the ofwcall to after the mtmsrd just to prove that the same (relative) instruction position would be reported with or without those:
> Index: /usr/src/sys/powerpc/ofw/ofwcall64.S
> ===================================================================
> --- /usr/src/sys/powerpc/ofw/ofwcall64.S (revision 272558)
> +++ /usr/src/sys/powerpc/ofw/ofwcall64.S (working copy)
> @@ -128,13 +128,22 @@
> bctrl
>
> /* Reload stack pointer and MSR from the OFW stack */
> + isync
> + isync
> ld %r6,24(%r1)
> + isync
> + isync
> ld %r2,16(%r1)
> + isync
> + isync
> ld %r1,8(%r1)
> + isync
> + isync
>
> /* Now set the real MSR */
> mtmsrd %r6
> isync
> + isync
>
> /* Sign-extend the return value from OF */
> extsw %r3,%r3
The result that I got was that the last isync above is where the SRR0 is reported as pointing when the trap happens. (No multiple-fault problem showed up so it did not point into the exception handling code.)
With all the extra isyncs removed (the normal code having only one isync in that area, the one just after the mtmsrd), the extsw instruction is in that position and it is what SRR0 pointed to. So that aspect ended up confirmed.
The version of the code with the extra isyncs should have forced any of the exceptions from the ld commands (and before) to happen before the mtmsrd was executed. As near as I can tell the implication would be that the mtmsrd itself is what is having an exception happen.
SRR1: 0x1000000040101120
lr: 0x8a64e8 .ofwcall+0xa8 (i.e., just after the bctrl in both types of code).
From all this I expect that ofwcall returned before the exception happened.
ctr: 0xff846d78
cr: 0x22000022
xer: 0
I expect that the reported dar and dsisr are garbage (probably a wrong kind of trap to have them initialized). But they were listed as:
dar: 0x810248fbc10250fb
dsisr: 0xe102587f8802a648
I've no clue if openfirmware was well behaved about register values as of when it returned to ofwcall. r6 in the list below does not look good to me: a little more than r1's value, suggesting a stack address is being displayed instead of an msrd value. But by the time of mtmsrd %r6 execution r1 should no longer have the OFW stack address but one for the kernel at the time. (Presumes openfirmware was well behaved.)
r0: 0
r1: 0xbc0558
r2: 0xe18dd0 MP_ncpus
r3: 0xd24450
r4: 0x8a64e8 .ofwcall+0xa8 (specific address could depend on other variations in builds)
r5: 0
r6: 0xbc0568
r7: 0xe5f63d ofw_real_mode
r8: 0x1
r9: 0xe5f63d ofw_name_history_+0x15 (part of my crash information dumping hacks)
r10: 0x1c35ec0
r11: 0
r12: 0x22000022
r13: 0xddaf29 thread0
r14-r19: 0
r20: 0x10f6000
r21: 0x4
r22: 0x1801bd4
r23: 0x1803a28
r24: 0xc000000000008760
r25: 0xcd4a98
r26: 0xcf6758
r27: 0xcd4a98
r28: 0xe62690 emergency_buffer.7721+0x8
r29: 0x1874d0 ofw_name_history_pos (part of my information dumping hacks)
r30: 0x9000000000001032
r31: 0xc0000000000084a0
[ofw_name_history is how I earlier found the specific ofwcall that did not return all the way without getting an associated exception. ofw_name_history content is dumped by my DDB script that I forced to exist and runs when the exception happens.]
Now for the odd part of how I got to the above happening.
Given the multiple-fault problem that was involved I decided to try to get some information on which type(s) of exception(s) by making PC values distinct: duplicating the code that contained the address being reported so each use had its own copy.
So I ended up with not just realtrap but realtrap1, realtrap2, and realtrap3, for example, that look like:
> +realtrap1:
> +/* Test whether we already had PR set */
> + mfsrr1 %r1
> + mtcr %r1
> + mfsprg1 %r1 /* restore SP (might have been
> + overwritten) */
> + bf 17,rt1_k_trap /* branch if PSL_PR is false */
> + GET_CPUINFO(%r1)
> + ld %r1,PC_CURPCB(%r1)
> + mr %r27,%r28 /* Save LR, r29 */
> + mtsprg2 %r29
> + bl restore_kernsrs /* enable kernel mapping */
> + mfsprg2 %r29
> + mr %r28,%r27
> + FRAME_SETUP(PC_TEMPSAVE)
> + ba trapagain
> +rt1_k_trap:
> + FRAME_SETUP(PC_TEMPSAVE)
> + ba trapagain
Since the original reports where for an address inside FRAME_SETUP code, I needed distinct copies of FRAME_SETUP to have unique PCs for the different uses.
(I could have used realtrap instead of having realtrap3 but ended up with realtrap unused.)
The trapagain code was after the reported fault place and so was not duplicated.
generictrap also got its own copy of such code (no label).
That left alitrap as the only use of the original s_trap code. (It is the only bla style use of s_trap in the original code and so I left that alone.)
After these changes I got the Show Registers results that I reported above instead of SRR0 values from one of the exception handler paths. (That is not what I expected.) The detailed changes to trap_subr64.S were:
> Index: /usr/src/sys/powerpc/aim/trap_subr64.S
> ===================================================================
> --- /usr/src/sys/powerpc/aim/trap_subr64.S (revision 272558)
> +++ /usr/src/sys/powerpc/aim/trap_subr64.S (working copy)
> @@ -583,7 +583,7 @@
> /* Try to detect a kernel stack overflow */
> mfsrr1 %r31
> mtcr %r31
> - bt 17,realtrap /* branch is user mode */
> + bt 17,realtrap1 /* branch is user mode */
> mfsprg1 %r31 /* get old SP */
> clrrdi %r31,%r31,12 /* Round SP down to nearest page */
> sub. %r30,%r31,%r30 /* SP - DAR */
> @@ -590,7 +590,7 @@
> bge 1f
> neg %r30,%r30 /* modulo value */
> 1: cmpldi %cr0,%r30,4096 /* is DAR within a page of SP? */
> - bge %cr0,realtrap /* no, too far away. */
> + bge %cr0,realtrap2 /* no, too far away. */
>
> /* Now convert this DSI into a DDB trap. */
> GET_CPUINFO(%r1)
> @@ -628,6 +628,68 @@
> mr %r28,%r27
> ba s_trap
>
> +realtrap1:
> +/* Test whether we already had PR set */
> + mfsrr1 %r1
> + mtcr %r1
> + mfsprg1 %r1 /* restore SP (might have been
> + overwritten) */
> + bf 17,rt1_k_trap /* branch if PSL_PR is false */
> + GET_CPUINFO(%r1)
> + ld %r1,PC_CURPCB(%r1)
> + mr %r27,%r28 /* Save LR, r29 */
> + mtsprg2 %r29
> + bl restore_kernsrs /* enable kernel mapping */
> + mfsprg2 %r29
> + mr %r28,%r27
> + FRAME_SETUP(PC_TEMPSAVE)
> + ba trapagain
> +rt1_k_trap:
> + FRAME_SETUP(PC_TEMPSAVE)
> + ba trapagain
> +
> +
> +realtrap2:
> +/* Test whether we already had PR set */
> + mfsrr1 %r1
> + mtcr %r1
> + mfsprg1 %r1 /* restore SP (might have been
> + overwritten) */
> + bf 17,rt2_k_trap /* branch if PSL_PR is false */
> + GET_CPUINFO(%r1)
> + ld %r1,PC_CURPCB(%r1)
> + mr %r27,%r28 /* Save LR, r29 */
> + mtsprg2 %r29
> + bl restore_kernsrs /* enable kernel mapping */
> + mfsprg2 %r29
> + mr %r28,%r27
> + FRAME_SETUP(PC_TEMPSAVE)
> + ba trapagain
> +rt2_k_trap:
> + FRAME_SETUP(PC_TEMPSAVE)
> + ba trapagain
> +
> +realtrap3:
> +/* Test whether we already had PR set */
> + mfsrr1 %r1
> + mtcr %r1
> + mfsprg1 %r1 /* restore SP (might have been
> + overwritten) */
> + bf 17,rt3_k_trap /* branch if PSL_PR is false */
> + GET_CPUINFO(%r1)
> + ld %r1,PC_CURPCB(%r1)
> + mr %r27,%r28 /* Save LR, r29 */
> + mtsprg2 %r29
> + bl restore_kernsrs /* enable kernel mapping */
> + mfsprg2 %r29
> + mr %r28,%r27
> + FRAME_SETUP(PC_TEMPSAVE)
> + ba trapagain
> +rt3_k_trap:
> + FRAME_SETUP(PC_TEMPSAVE)
> + ba trapagain
> +
> +
> /*
> * generictrap does some standard setup for trap handling to minimize
> * the code that need be installed in the actual vectors. It expects
> @@ -666,6 +728,20 @@
> mfsrr1 %r31
> mtcr %r31
>
> + bf 17,gt_k_trap /* branch if PSL_PR is false */
> + GET_CPUINFO(%r1)
> + ld %r1,PC_CURPCB(%r1)
> + mr %r27,%r28 /* Save LR, r29 */
> + mtsprg2 %r29
> + bl restore_kernsrs /* enable kernel mapping */
> + mfsprg2 %r29
> + mr %r28,%r27
> + FRAME_SETUP(PC_TEMPSAVE)
> + ba trapagain
> +gt_k_trap:
> + FRAME_SETUP(PC_TEMPSAVE)
> + ba trapagain
> +
> s_trap:
> bf 17,k_trap /* branch if PSL_PR is false */
> GET_CPUINFO(%r1)
> @@ -785,7 +861,7 @@
> ld %r31,(PC_DBSAVE+CPUSAVE_R31)(%r1)
> mtsprg3 %r31 /* SPRG3 was clobbered by FRAME_LEAVE */
> mfsprg1 %r1
> - b realtrap
> + b realtrap3
> dbleave:
> FRAME_LEAVE(PC_DBSAVE)
> rfid
>
Reverting this one file to the original code goes back to the historical exception-in-exception-handler report by DDB's Show Register.
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-ppc
mailing list