PowerMac G5 powerpc64: new context where repeatedly booting varies between failing and working
Mark Millard
markmi at dsl-only.net
Sat Feb 14 22:21:55 UTC 2015
I've been able to show that the TOC entry that authnone_init accesses is messed up and it is so from very early on.
I took advantage of sys/rpc/auth_none.c exposing the static variable's address calculation result, in fact the same one that the crash happened for:
AUTH *
authnone_create()
{
struct authnone_private *ap = &authnone_private;
return (&ap->no_client);
}
The no_client even happens to be the first field of the struct pointed to by ap.
So I put calls of that routine where it would periodically monitor the calculation during the early part of booting:
root at FBSDG5M1:/usr/src # svnlite diff sys/kern/init_main.c
Index: sys/kern/init_main.c
===================================================================
--- sys/kern/init_main.c (revision 278443)
+++ sys/kern/init_main.c (working copy)
@@ -91,6 +91,11 @@
#include <ddb/ddb.h>
#include <ddb/db_sym.h>
+#if defined(VERBOSE_SYSINIT)
+/* HACK! */
+extern void* authnone_create(void);
+#endif
+
void mi_startup(void); /* Should be elsewhere */
/* Components of the first process -- never freed. */
@@ -282,7 +287,9 @@
#if defined(VERBOSE_SYSINIT)
if (verbose)
+{printf(" authnone_private address generation check: %p ", authnone_create());
printf("done.\n");
+}
#endif
So when it boots successfully it reports messages like:
malloc_init(&M_JFREEFRAG)... authnone_private address generation check: 0xe313a8 done.
When the boots fails the very first such message of that form shows the 0x2400002200e313a8 value, as do all the later ones. When the boot works it always shows 0xe313a8.
[I have since shortened the text with:
printf(" &authnone_private: %p ", authnone_create());]
It would appear that the TOC entry generation/update is the source of the variations in value that are observed that can lead to a crash.
===
Mark Millard
markmi at dsl-only.net
On 2015-Feb-14, at 01:53 AM, Mark Millard <markmi at dsl-only.net> wrote:
FreeBSD context: PowerMac G5 Quad-Core with 16GB RAM.
root at FBSDG5M1:/usr/src # freebsd-version -ku ; uname -a
10.1-STABLE
10.1-STABLE
FreeBSD FBSDG5M1 10.1-STABLE FreeBSD 10.1-STABLE #10 r278443M: Fri Feb 13 03:26:27 PST 2015 root at FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64vtsc powerpc
Other configuration/context details for /boot/kernel10.1S/kernel are given late in this message. But I will here mention the use of:
options DDB
options GDB
options VERBOSE_SYSINIT
options BOOTVERBOSE=1
options BOOTHOWTO=RB_VERBOSE
I've got a new context where repeatedly booting via power-off then power-on varies between failing and working, always failing in the same way/place when it does.
Here are 3 addresses (&...: ...) from a successful boot of kernel10.1S, where the first one will be different for the failing boots (this is from dmesg -a):
authnone_init(0)... &authnone_private: 0xe313a8, &_null_auth: 0xe31608, &authnone_ops: 0xc31f80 done.
where the extra output is from the added printf in:
static void
authnone_init(void *dummy)
{
struct authnone_private *ap = &authnone_private;
XDR xdrs;
ap->no_client.ah_cred = ap->no_client.ah_verf = _null_auth;
ap->no_client.ah_ops = &authnone_ops;
printf(" &authnone_private: %p, &_null_auth: %p, &authnone_ops: %p ", ap, &_null_auth, &authnone_ops);
xdrmem_create(&xdrs, ap->mclient, MAX_MARSHAL_SIZE, XDR_ENCODE);
xdr_opaque_auth(&xdrs, &ap->no_client.ah_cred);
xdr_opaque_auth(&xdrs, &ap->no_client.ah_verf);
ap->mcnt = XDR_GETPOS(&xdrs);
XDR_DESTROY(&xdrs);
}
SYSINIT(authnone_init, SI_SUB_KMEM, SI_ORDER_ANY, authnone_init, NULL);
The authnone_init code for through its first store into ap->... is:
00000000007a3ea4 <.authnone_init> mflr r0
00000000007a3ea8 <.authnone_init+0x4> std r29,-24(r1)
00000000007a3eac <.authnone_init+0x8> std r30,-16(r1)
00000000007a3eb0 <.authnone_init+0xc> std r31,-8(r1)
00000000007a3eb4 <.authnone_init+0x10> std r0,16(r1)
00000000007a3eb8 <.authnone_init+0x14> stdu r1,-192(r1)
00000000007a3ebc <.authnone_init+0x18> mr r31,r1
00000000007a3ec0 <.authnone_init+0x1c> ld r29,304(r2)
00000000007a3ec4 <.authnone_init+0x20> ld r9,312(r2)
00000000007a3ec8 <.authnone_init+0x24> ld r0,0(r9)
00000000007a3ecc <.authnone_init+0x28> ld r11,8(r9)
00000000007a3ed0 <.authnone_init+0x2c> ld r9,16(r9)
00000000007a3ed4 <.authnone_init+0x30> std r0,24(r29)
When the boots fail they fail on that last std: std r0,24(r29) , doing so based on
r2: 0xd2da20
r29: 0x2400002200e313a8
bad virtual address: 0x2400002200e313c0
(These are from a boot-crash time register display, so hand copied off screen as it is too soon for interaction with DDB. I've got a default ddb script in place that does the display.)
When it boots okay r29 = 0x00e313a8 and the address accessed is 0x00e313c0 instead: see the first address that I started with above (for &authnone_private).
In other words: The difference is the upper half of r29. I've no evidence that r2 is corrupt for failing boots for this code.
So either 304(r2) accesses different values from one time to the next for booting or the r29 register is corrupted somehow between
00000000007a3ec0 <.authnone_init+0x1c> ld r29,304(r2)
and
00000000007a3ed4 <.authnone_init+0x30> std r0,24(r29)
(such as an interrupt not restoring the 64bit-ABI's register value fully).
At this point I've no clue if variability in the TOC contents that 304(r2) references makes any sense or not. I've yet to figure out how it is established.
More FreeBSD configuration details:
10.1-STABLE's buildworld kernel and installworld were all done from the PowerMac G5 itself.
root at FBSDG5M1:/usr/src # more sys/powerpc/conf/GENERIC64
GENERIC64 GENERIC64vtsc
root at FBSDG5M1:/usr/src # more sys/powerpc/conf/GENERIC64vtsc
include GENERIC64
ident GENERIC64vtsc
nooptions PS3 #Sony Playstation 3 HACK!!! to allow sc
options DDB # HACK!!! to dump early crash info (but 11.0-CURRENT already has it)
options GDB # HACK!!! ...
options VERBOSE_SYSINIT
options BOOTVERBOSE=1
options BOOTHOWTO=RB_VERBOSE
#options KTR
#options KTR_MASK=KTR_TRAP
#options KTR_CPUMASK=0xF
#options KTR_VERBOSE
# HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt historically mishandled during booting
device sc
#device kbdmux # HACK: already listed by vt
options SC_OFWFB # OFW frame buffer
options SC_DFLT_FONT # compile font in
makeoptions SC_DFLT_FONT=cp437
# Disable extra checking typically used for FreeBSD 11.0-CURRENT:
nooptions DEADLKRES #Enable the deadlock resolver
nooptions INVARIANTS #Enable calls of extra sanity checking
nooptions INVARIANT_SUPPORT #Extra sanity checks of internal structures, required by INVARIANTS
nooptions WITNESS #Enable checks to detect deadlocks and cycles
nooptions WITNESS_SKIPSPIN #Don't run witness on spinlocks for speed
nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones
root at FBSDG5M1:/usr/src # svnlite status
? .snap
M sys/ddb/db_main.c
M sys/ddb/db_script.c
M sys/powerpc/conf
? sys/powerpc/conf/GENERIC64vtsc
M sys/powerpc/ofw/ofw_machdep.c
M sys/powerpc/ofw/ofwcall64.S
M sys/powerpc/powermac/powermac_thermal.c
M sys/rpc/auth_none.c
root at FBSDG5M1:/usr/src # more /boot/loader.conf
#kernel="kernel"
#kernel="kernel10.1RE"
kernel="kernel10.1S"
#kernel="kernel11C"
verbose_loading="YES"
kern.vty=vt
root at FBSDG5M1:/usr/src # more /etc/make.conf
WRKDIRPREFIX=/usr/obj/portswork
WITH_DEBUG=
MALLOC_PRODUCTION=
root at FBSDG5M1:/usr/src # more /etc/src.conf
CFLAGS+=-DELF_VERBOSE
#WITH_DEBUG_FILES=
#WITHOUT_CLANG=
Other than powermac_thermal.c (from Justin H.) the source changes are for investigation of various early-boot problems for PowerMac G5's. The PowerMac G5 that powermac_thermal.c was put in place to experiment with is no longer around but I've not yet removed the powermac_thermal.c update from my environment.
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-ppc
mailing list