Crash dumps not working correctly for amd64?

Sean Chittenden sean at gigave.com
Tue Feb 22 19:57:51 GMT 2005


> >Howdy.  I've got myself an interesting situation.  It seems as though
> >amd64 is unable to collect crash dumps via savecore(8).  Has anyone
> >else seen this?  From dmesg(1):
> >
> >Checking for core dump on /dev/da0s1b ...
> >savecore: first and last dump headers disagree on /dev/da0s1b
> >Feb 21 12:45:59 host savecore: first and last dump headers disagree on 
> >/dev/da0s1b
> >savecore: unsaved dumps found but not saved
> >
> >???  sys/amd64/amd64/dump_machdep.c and sys/i386/i386/dump_machdep.c
> >are essentially identical.  I'm not familiar enough with these
> >innards, but reviewing savecore(8) didn't point out anything obvious.
> >I'm dumping onto a twa(4) controller.
> >
> >Are there any known workarounds to get this info?  I'm tempted to turn
> >off swap in fstab(5) that way the next time the machine comes up after
> >a crash, it'll still have the dump in tact and could poke at it as
> >time permitted.  Other suggestions?  -sc
> 
> Can you modify savecore to dump the headers anyways so they can be 
> inspected?

Yup... yikes!  This is far from good or correct.  Hrm...  I'm at a
loss as to the reason, however.  It's like the last dump header is
never overwritten in the dump process and is massively stale.  I've
made some changes to savecore(8) (can someone give me approval to
commit these?).  The resulting output is below using the new
format/verbose output:

# savecore -vf
bounds number: 9
checking for kernel dump on device /dev/da0s1b
mediasize = 3221225472
sectorsize = 512
magic mismatch on last dump header on /dev/da0s1b
forcing magic on /dev/da0s1b
savecore: first and last dump headers disagree on /dev/da0s1b
savecore: reboot after panic: vrele: negative ref cnt
Checking for available free space
Dump header from device /dev/da0s1b
  Architecture: amd64
  Architecture Version: 16777216
  Dump Length: 2146631680B (2047 MB)
  Blocksize: 512
  Dumptime: Thu Dec 16 03:06:24 2004
  Hostname: nfs.example.com
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 5.3-STABLE #1: Wed Dec  8 22:20:38 PST 2004
    root at nfs.example.com:/usr/obj/usr/src/sys/NFS
  Panic String: vrele: negative ref cnt
  Dump Parity: 1999448632
  Bounds: 9
  Dump Status: bad
savecore: writing core to vmcore.9
2146631680

If you run it w/ two -v's, you get the first and last header info:

# savecore -vvf
bounds number: 10
checking for kernel dump on device /dev/da0s1b
mediasize = 3221225472
sectorsize = 512
magic mismatch on last dump header on /dev/da0s1b
forcing magic on /dev/da0s1b
First dump headers:
Dump header from device /dev/da0s1b
  Architecture: amd64
  Architecture Version: 16777216
  Dump Length: 2146631680B (2047 MB)
  Blocksize: 512
  Dumptime: Mon Feb 21 19:12:48 2005
  Hostname: www.example.com
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 5.3-STABLE #0: Wed Feb 16 21:42:19 PST 2005
    root at www.example.com:/usr/obj/usr/src/sys/WWW
  Panic String: page fault
  Dump Parity: 1475841892
  Bounds: 10
  Dump Status: unknown

Last dump headers:
Dump header from device /dev/da0s1b
  Architecture: amd64
  Architecture Version: 16777216
  Dump Length: 2146631680B (2047 MB)
  Blocksize: 512
  Dumptime: Thu Dec 16 03:06:24 2004
  Hostname: nfs.example.com
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 5.3-STABLE #1: Wed Dec  8 22:20:38 PST 2004
    root at nfs.example.com:/usr/obj/usr/src/sys/NFS
  Panic String: vrele: negative ref cnt
  Dump Parity: 1999448632
  Bounds: 10
  Dump Status: unknown

savecore: first and last dump headers disagree on /dev/da0s1b
savecore: reboot after panic: vrele: negative ref cnt
Checking for available free space
Dump header from device /dev/da0s1b
  Architecture: amd64
  Architecture Version: 16777216
  Dump Length: 2146631680B (2047 MB)
  Blocksize: 512
  Dumptime: Thu Dec 16 03:06:24 2004
  Hostname: nfs.example.com
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 5.3-STABLE #1: Wed Dec  8 22:20:38 PST 2004
    root at nfs.example.com:/usr/obj/usr/src/sys/NFS
  Panic String: vrele: negative ref cnt
  Dump Parity: 1999448632
  Bounds: 10
  Dump Status: bad
savecore: writing core to vmcore.10
2146631680

Why there are different dump header values, I'm not sure.  But, with
the -f option, you can get a core regardless of the state of the dump
and its header values.  That doesn't mean you get a good dump, but you
at least get something.  It looks like my core dumps are hosed or
swapon(8) has clobbered some data on the image such that kgdb(1) can't
extract anything useful.

kgdb: core file: vmcore.0
kgdb: kernel image: /boot/kernel/kernel
kgdb: cannot read KPML4phys

Ah well, savecore_flags="-vvf" should do the trick next time this box
dumps.  -sc

-- 
Sean Chittenden
-------------- next part --------------
Index: savecore.8
===================================================================
RCS file: /home/ncvs/src/sbin/savecore/savecore.8,v
retrieving revision 1.22
diff -u -r1.22 savecore.8
--- savecore.8	18 Jan 2005 10:09:37 -0000	1.22
+++ savecore.8	22 Feb 2005 19:24:27 -0000
@@ -71,11 +71,13 @@
 .Nm
 will ignore it.
 .It Fl f
-Force a dump to be taken even if the dump was cleared.
+Force a dump to be taken even if either the dump was cleared or if the
+dump header information is inconsistent.
 .It Fl k
 Do not clear the dump after saving it.
 .It Fl v
 Print out some additional debugging information.
+Speicify twice for more information.
 .It Fl z
 Compress the core dump and kernel (see
 .Xr gzip 1 ) .
Index: savecore.c
===================================================================
RCS file: /home/ncvs/src/sbin/savecore/savecore.c,v
retrieving revision 1.71
diff -u -r1.71 savecore.c
--- savecore.c	10 Feb 2005 09:19:33 -0000	1.71
+++ savecore.c	22 Feb 2005 19:41:53 -0000
@@ -88,6 +88,10 @@
 /* The size of the buffer used for I/O. */
 #define	BUFFERSIZE	(1024*1024)
 
+#define STATUS_BAD	0
+#define STATUS_GOOD	1
+#define STATUS_UNKNOWN	2
+
 static int checkfor, compress, clear, force, keep, verbose;	/* flags */
 static int nfound, nsaved, nerr;			/* statistics */
 
@@ -95,25 +99,39 @@
 
 static void
 printheader(FILE *f, const struct kerneldumpheader *h, const char *device,
-    int bounds)
+    int bounds, const int status)
 {
 	uint64_t dumplen;
 	time_t t;
+	const char *stat_str;
 
-	fprintf(f, "Good dump found on device %s\n", device);
+	fprintf(f, "Dump header from device %s\n", device);
 	fprintf(f, "  Architecture: %s\n", h->architecture);
-	fprintf(f, "  Architecture version: %d\n",
-	    dtoh32(h->architectureversion));
+	fprintf(f, "  Architecture Version: %u\n", h->architectureversion);
 	dumplen = dtoh64(h->dumplength);
-	fprintf(f, "  Dump length: %lldB (%lld MB)\n", (long long)dumplen,
+	fprintf(f, "  Dump Length: %lldB (%lld MB)\n", (long long)dumplen,
 	    (long long)(dumplen >> 20));
 	fprintf(f, "  Blocksize: %d\n", dtoh32(h->blocksize));
 	t = dtoh64(h->dumptime);
 	fprintf(f, "  Dumptime: %s", ctime(&t));
 	fprintf(f, "  Hostname: %s\n", h->hostname);
-	fprintf(f, "  Versionstring: %s", h->versionstring);
-	fprintf(f, "  Panicstring: %s\n", h->panicstring);
+	fprintf(f, "  Magic: %s\n", h->magic);
+	fprintf(f, "  Version String: %s", h->versionstring);
+	fprintf(f, "  Panic String: %s\n", h->panicstring);
+	fprintf(f, "  Dump Parity: %u\n", h->parity);
 	fprintf(f, "  Bounds: %d\n", bounds);
+
+	switch(status) {
+	case STATUS_BAD:
+	  stat_str = "bad";
+	  break;
+	case STATUS_GOOD:
+	  stat_str = "good";
+	  break;
+	default:
+	  stat_str = "unknown";
+	}
+	fprintf(f, "  Dump Status: %s\n", stat_str);
 	fflush(f);
 }
 
@@ -214,12 +232,14 @@
 	FILE *info, *fp;
 	int fd, fdinfo, error, wl;
 	int nr, nw, hs, he = 0;
-	int bounds;
+	int bounds, status;
 	u_int sectorsize;
 	mode_t oumask;
 
+	bounds = getbounds();
 	dmpcnt = 0;
 	mediasize = 0;
+	status = STATUS_UNKNOWN;
 
 	if (buf == NULL) {
 		buf = malloc(BUFFERSIZE);
@@ -266,6 +286,7 @@
 			printf("magic mismatch on last dump header on %s\n",
 			    device);
 
+		status = STATUS_BAD;
 		if (force == 0)
 			goto closefd;
 
@@ -284,7 +305,10 @@
 		syslog(LOG_ERR,
 		    "unknown version (%d) in last dump header on %s",
 		    dtoh32(kdhl.version), device);
-		goto closefd;
+
+		status = STATUS_BAD;
+		if (force == 0)
+			goto closefd;
 	}
 
 	nfound++;
@@ -295,7 +319,9 @@
 		syslog(LOG_ERR,
 		    "parity error on last dump header on %s", device);
 		nerr++;
-		goto closefd;
+		status = STATUS_BAD;
+		if (force == 0)
+			goto closefd;
 	}
 	dumpsize = dtoh64(kdhl.dumplength);
 	firsthd = lasthd - dumpsize - sizeof kdhf;
@@ -308,11 +334,25 @@
 		nerr++;
 		goto closefd;
 	}
+
+	if (verbose >= 2) {
+		printf("First dump headers:\n");
+		printheader(stdout, &kdhf, device, bounds, -1);
+
+		printf("\nLast dump headers:\n");
+		printheader(stdout, &kdhl, device, bounds, -1);
+		printf("\n");
+	}
+
 	if (memcmp(&kdhl, &kdhf, sizeof kdhl)) {
 		syslog(LOG_ERR,
 		    "first and last dump headers disagree on %s", device);
 		nerr++;
-		goto closefd;
+		status = STATUS_BAD;
+		if (force == 0)
+			goto closefd;
+	} else {
+		status = STATUS_GOOD;
 	}
 
 	if (checkfor) {
@@ -333,12 +373,10 @@
 		goto closefd;
 	}
 
-	bounds = getbounds();
-
 	sprintf(buf, "info.%d", bounds);
 
 	/*
-	 * Create or overwrite any existing files.
+	 * Create or overwrite any existing dump header files.
 	 */
 	fdinfo = open(buf, O_WRONLY | O_CREAT | O_TRUNC, 0600);
 	if (fdinfo < 0) {
@@ -365,9 +403,9 @@
 	info = fdopen(fdinfo, "w");
 
 	if (verbose)
-		printheader(stdout, &kdhl, device, bounds);
+		printheader(stdout, &kdhl, device, bounds, status);
 
-	printheader(info, &kdhl, device, bounds);
+	printheader(info, &kdhl, device, bounds, status);
 	fclose(info);
 
 	syslog(LOG_NOTICE, "writing %score to %s",
@@ -492,6 +530,9 @@
 	struct fstab *fsp;
 	char *savedir;
 
+	checkfor = compress = clear = force = keep = verbose = 0;
+	nfound = nsaved = nerr = 0;
+
 	openlog("savecore", LOG_PERROR, LOG_DAEMON);
 
 	savedir = strdup(".");
@@ -511,7 +552,7 @@
 			keep = 1;
 			break;
 		case 'v':
-			verbose = 1;
+			verbose++;
 			break;
 		case 'f':
 			force = 1;


More information about the freebsd-amd64 mailing list