lr=u_trap+0x10 and srr0=k_trap+0x28 for "stopped at 0 illegal instruction 0" before-copyright hang on PowerMac G5's

Mark Millard markmi at dsl-only.net
Wed Sep 24 09:05:02 UTC 2014


Now that I've had a kernel/boot crash with a successful DDB bt and show registers (a different submittal) it makes for a good comparison/contrast with what DDB reports for this "before copyright" crash.

Something unique to the "before copyright" context is...

No registers are reported to have values that point into the range between tmpstk and esym.

In other words: There is no valid stack pointer reported as far as I can tell. r1 has the value 0 instead of being a handling a valid stack address. tmpstk=0xbd7000 and esym=0xbdb000 (example for one of my WITH_DEBUG_FILES= and options DDB and GDB builds of 10.1-BETA2). That at least gives a ball park on the range to expect for pointing into the stack even with some build variation.

It leaves me wondering if the DDB report is for a nested exception handling. That could explain why lr points to u_trap+0x10 and srr0 points to k_trap+0x28 when normally srr0 would point to the the failing instruction (or the instruction after) and lr to where that routine would normally return to.

The register values that are reported for my 10.1-BETA2 builds that crash before the copyright notice are:

r0: 0
r1: 0
r2: 0xc81538 vop_unlock_desc
r3: 0xd18868
r4: 0x894b58
r5: 0
r6: 0xc1dee0 M_AUDITBSM
r7: 0xe3f818 ofw_real_mode
r8: 0x1
r9: 0xe0f580 __pcpu
r10: 0x1c35ec0
r11: 0
r12: 0x10000000
r13: 0xdbb290 thread0 (Note: another submittal has this mistyped as 0xdbb290.)
r14-r19: all 0
r20: 0x10c1000
r21: 0x4
r22: 0x180abd4
r23: 0x1803a28
r24: 0xc000000000008760
r25: 0xcc89b8 smp_no...
r26: 0xcea108 ofw_rend...
r27: 0x894b58 ofwcall+0xa8
r28: 0x894b58 ofwcall+0xa8
r29: 2400022
r30: 9000000000001032
r31: 0xbb7d38

srr0: 0x102720 k_trap+0x28
srr1: 9000000000001032
lr: 0x1026f0 u_trap+0x10
ctr: 0xff846d78
cr: 2000deb0
xer: 0
dar: f...d50 (lots of f's)
dsisr: 42000000






===
Mark Millard
markmi at dsl-only.net

On Sep 20, 2014, at 3:42 PM, Mark Millard <markmi at dsl-only.net> wrote:

[I corrected the SSR0 in the subject to be SRR0.]

I did miss a register in my list (it matched the shown r30 value). And it turns out to probably be very important to interpreting what the "show registers" is reporting:

SRR1: 0x9000000000001032

But bits 43-46 of SRR1 are supposed to indicate which type of Program Exception, using a single binary 1 to so. No such 1's are present.

Illegal instruction would have been bit 44 being 1. (PowerPC has the upper bit numbered zero and increases from there.)

So the ddb "show registers" is apparently not reporting the status as of when the "stopped at 0 illegal instruction 0" happened. Thus other things are also likely not from that exact time frame.



And I misinterpreted the LR value status: The LR value was just left over from the restore_kernsrs returning when it finished. Execution then flowed into k_trap. Nothing unusual involved.





===
Mark Millard
markmi at dsl-only.net

On Sep 18, 2014, at 8:57 PM, Mark Millard <markmi at dsl-only.net> wrote:

I modified DDB to automatically "show registers" even at the early "before Copyright" crash time. The end of this note will show the /usr/src/sys/ddb/db_script.c diff for the hack. While I also had DDB bt, the bt does not actually print a back trace for this context. (It might for others.)

The registers give interesting context despite the lack of a back trace. I do not know if it will be sufficient to be of much immediate help if someone used the information to start looking at the problem.

I'll start with register lr: 0x1026f0 u_trap+0x10.

/usr/src/sys/powerpc/aim/trap_subr64.S has:

s_trap:
        bf      17,k_trap               /* branch if PSL_PR is false */
        GET_CPUINFO(%r1)
u_trap:
        ld      %r1,PC_CURPCB(%r1)
        mr      %r27,%r28               /* Save LR, r29 */
        mtsprg2 %r29
        bl      restore_kernsrs         /* enable kernel mapping */
        mfsprg2 %r29
        mr      %r28,%r27

/*
 * Now the common trap catching code.
 */
k_trap:
        FRAME_SETUP(PC_TEMPSAVE)
/* Call C interrupt dispatcher: */
trapagain:

and so this appears to indicate a pending return to execute the "mfsprg2 %r29" after "bl restore_kernsrs", which indicates that restore_kernsrs should be active.

But register srr0 indicates: 0x102720 k_trap+0x28. (So apparently in FRAME_SETUP(PC_TEMPSAVE) someplace.)

So it appears to me that the processor got to the k_trap code during the supposed restore_kernsrs time frame. (But I'm no expert at these sorts of things or for the processor.)

I'll list the other register values:

r0:  0
r1:  0
r2:  0xc1be80 M_AUDITBSM
r3:  0xb16138
r4:  0x8926e8 .ofwcall+0xa8
r5:  0
r6:  0xbb5f90
r7:  0xe3d118 ofw_real_mode
r8:  0x1
r9:  0xe0ce80 __pcpu
r10: 0x1c35ec9
r11: 0
r12: 0x10000000
r13: db890    thread0
r14-r19: all 0
r20: 0x10bc000
r21: 0x4
r22: 0x1801db4
r23: 0x1803a28
r24: 0xc000000000008760
r25: 0xcc6908 smp_no_rendevous_barrier
r26: 0xec79e0 ofw_rendezvous_dispatch (yep one has v and the other zv)
r27: 0x8926e8 .ofwcall+0xa8
r28: 0x8926e8 .ofwcall+0xa8 (yep: same value)
r29: 0x24000022
r30: 0x9000000000001032
r31: 0xc7f488 vop_unlock_desc

ctr: 0xff846d78
cr:  0x2000d7b0
xer: 0
dar: 0xfffffffffffffd50
dsisr: 0x42000000

(Hopefully this manual transcription from the screen display is complete --and also accurate for what it does present.)




The personal HACK to /usr/src/sys/ddb/db_script.c's db_script_kdbenter(...) to have it show registers and try bt...

$ cd /usr/src/sys/ddb/
$ svnlite diff .
Index: db_script.c
===================================================================
--- db_script.c	(revision 271610)
+++ db_script.c	(working copy)
@@ -319,10 +319,25 @@
 {
 	char scriptname[DB_MAXSCRIPTNAME];
 
+	/* HACK!!! : Additional lines to force a basic default script to exist.
+	 * Will dump information even if ddb input is not available for early crash.
+	 * Used to get more information about PowerMac G5 "before Copyright" hangs.
+	 */
+	struct ddb_script *dsp = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT);
+	if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show registers; bt");
+
 	snprintf(scriptname, sizeof(scriptname), "%s.%s",
 	    DB_SCRIPT_KDBENTER_PREFIX, eventname);
 	if (db_script_exec(scriptname, 0) == ENOENT)
 		(void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
+
+	/* HACK!!! : Additional lines to always use the default script,
+	 *           even if scriptname existed and was executed.
+	 * Will dump information even if ddb input is not available for early crash.
+	 * Used to get more information about PowerMac G5 "before Copyright" hangs.
+	 */
+	else
+		(void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
 }
 
 /*-



===
Mark Millard
markmi at dsl-only.net

On Sep 16, 2014, at 9:28 PM, Mark Millard <markmi at dsl-only.net> wrote:

In part I sent directly to you because of a past exchange (July-27) where you had written:

> Nathan and I both speculate that it's
> dropping into Open Firmware (we make extensive use of OFW), and then
> messing something up, taking a page fault or something.

The specific text that I report and its uniformity when it is produced seems to add a little information beyond a speculated "page fault or something" and so might eventually help a little. As I understand the text it is reporting execution reaching address zero without any prior un-handled exceptions or other such that would stop it. A corrupted stack (pointer) so a bad return address or some such? I'd guess there are no explicit jumps to address zero so I expect that indirection is likely involved, with the content for the indirection messed up.

I really wish that I had a logic analyzer configuration for this. I've not found a way to make the failing context visible so far and the extra way of looking at things might have helped.




===
Mark Millard
markmi at dsl-only.net

On Sep 16, 2014, at 8:28 PM, Justin Hibbits <chmeeedalf at gmail.com> wrote:

Hi mark,

I see this on my G5, and I think it's due to the amount of RAM in the machine. More than 4gb seems to confuse open firmware when called by FreeBSD. There is some effort to remove the need of the callbacks but thus far it's not far along. The good news is that after it boots it's solid except when switching vtys, buy earlier this year or last year I added a sysctl hack to disable the call into open firmware on vty switch (don't recall offhand and not at my computer right now, but if you grep the sysctl output for reset and ofw you can find it).

-Justin

On Sep 16, 2014 8:01 PM, "Mark Millard" <markmi at dsl-only.net> wrote:
I've now spent time with rebooting and power-off/power-on for all 3 PowerMac G5's (one PowerMac7,2 and two PowerMac11,2's) and all 3 get the

> GDB: no debug ports present
> KDB: debugger backends: DDB
> KDB: current backend: DDB
> [ thread pid -1 tid 1006665719 ]
> Stopped at 0: illegal instruction 0
> db>

when they fail just before the Copyright notice would normally be displayed. None fail any earlier. At that spot none have failed any other way. It is the same SSD in all 3. (Happens with other SSD's as well.) Overall there is a mix of Radeon and NVIDIA display boards. Besides the SSD use and RAM upgrades the rest is stock equipment. scons used, not vt. (I've yet to try vt.)

Seeing a failure after the Copyright notice as been fairly rare in all my experiments from when I started last April or so. The ones that I've noted had Data Storage Interrupt reported. So far no examples of the above have been reported after the Copyright notice. So I'd guess that they are separate issues. Of course it seems that only in the last few days would I have seen the above sort of thing if it did happen after the Copyright notice: The prior history does not count for judgements about that.

===
Mark Millard
markmi at dsl-only.net

On Sep 16, 2014, at 8:15 AM, Mark Millard <markmi at dsl-only.net> wrote:

Using 10.1-BETA1 I added "options DDB" and "options GDB" to powerpc64's GENERIC64. (I also used WITH_DEBUG_FILES=, WITHOUT_CLANG=, and WITH_DEBUG= in /etc/make.conf.) So buildworld, kernel was basically just set up to have more of a debugging context around (including for any ports builds).

The result was new information about the PowerMac G5 boot hangups: The screen is no longer blank when the G5 is hung up without there being a Copyright notice yet. It says...

> GDB: no debug ports present
> KDB: debugger backends: DDB
> KDB: current backend: DDB
> [ thread pid -1 tid 1006665719 ]
> Stopped at 0: illegal instruction 0
> db>

(I had no ability to input at that point.) Normally the Copyright notice would have displayed instead of "[...]" and what follows. (I do not claim to have all the spacing, capitalization, and such correct above.)

That text is constant from hang to hang when it hangs just before it would normally output the Copyright notice: The numbers do not vary, much less the other text. It has never failed until after the two KDB messages are present. So far I've only tested one PowerMac G5, booting over and over for a few hours.



(I do not claim to be set up for remote kernel debugging. I just decided to let GDB go along for the ride when I added DDB.)

===
Mark Millard
markmi at dsl-only.net








More information about the freebsd-ppc mailing list