correctable DMA error AFAR

Thomas Moestl t.moestl at tu-bs.de
Mon Jul 21 15:47:12 PDT 2003


On Mon, 2003/07/21 at 15:44:36 -0400, Chris Jackman wrote:
> Error messages:
> 
> pcib0: correctable DMA error AFAR 0x476d6140 AFSR 0x40e600003f800000
> and
> pcib0: correctable DMA error AFAR 0x40adbc40 AFSR 0x40c400003f800000

These signal correctable ECC errors during a DVMA read
transaction. The differences in the AFSR values indicate different ECC
syndromes.
 
> My e250 has locked up twice in the last few weeks with these
> error messages.  The error gets repeated over and over
> again on the serial console, and I can't do anything to the
> box except power cycle it.

This interrupt is informational only, and the documentation states
that no further cleanup is required. We should probably clear the
error bits in the status register however, since this looks like the
interrupt being triggered again and again when any bits are still
set. The manual is a bit ambiguous on that point, but clearing the
bits is desirable anyway since it improves error reporting.

The attached patch implements this; can you please try it and report
how well it behaved on the next ECC error?

Thanks,
	- Thomas

-- 
Thomas Moestl <t.moestl at tu-bs.de>	http://www.tu-bs.de/~y0015675/
              <tmm at FreeBSD.org>		http://people.FreeBSD.org/~tmm/
PGP fingerprint: 1C97 A604 2BD0 E492 51D0  9C0F 1FE6 4F1D 419C 776C
-------------- next part --------------
Index: sparc64/pci/psycho.c
===================================================================
RCS file: /vol/ncvs/src/sys/sparc64/pci/psycho.c,v
retrieving revision 1.41
diff -u -r1.41 psycho.c
--- sparc64/pci/psycho.c	1 Jul 2003 15:52:06 -0000	1.41
+++ sparc64/pci/psycho.c	21 Jul 2003 22:41:12 -0000
@@ -745,12 +745,14 @@
 	struct psycho_softc *sc = (struct psycho_softc *)arg;
 	u_int64_t afar, afsr;
 
-	PSYCHO_WRITE8(sc, PSR_CE_INT_CLR, 0);
 	afar = PSYCHO_READ8(sc, PSR_CE_AFA);
 	afsr = PSYCHO_READ8(sc, PSR_CE_AFS);
 	/* It's correctable.  Dump the regs and continue. */
 	device_printf(sc->sc_dev, "correctable DMA error AFAR %#lx "
 	    "AFSR %#lx\n", (u_long)afar, (u_long)afsr);
+	/* Clear the error bits that we caught. */
+	PSYCHO_WRITE8(sc, PSR_CE_AFS, afsr & CEAFSR_ERRMASK);
+	PSYCHO_WRITE8(sc, PSR_CE_INT_CLR, 0);
 }
 
 static void
Index: sparc64/pci/psychoreg.h
===================================================================
RCS file: /vol/ncvs/src/sys/sparc64/pci/psychoreg.h,v
retrieving revision 1.6
diff -u -r1.6 psychoreg.h
--- sparc64/pci/psychoreg.h	6 Jan 2003 16:51:06 -0000	1.6
+++ sparc64/pci/psychoreg.h	21 Jul 2003 22:36:03 -0000
@@ -232,13 +232,28 @@
 #define	PCICTL_6ENABLE	0x000000000000003f	/* enable 6 PCI slots */
 
 /* Uncorrectable error asynchronous fault status registers */
-#define	UEAFSR_BLK	(1UL << 22)	/* pri. error caused by read */
-#define	UEAFSR_P_DTE	(1UL << 56)	/* pri. DMA translation error */
-#define	UEAFSR_S_DTE	(1UL << 57)	/* sec. DMA translation error */
-#define	UEAFSR_S_DWR	(1UL << 58)	/* sec. error during write */
-#define	UEAFSR_S_DRD	(1UL << 59)	/* sec. error during read */
-#define	UEAFSR_P_DWR	(1UL << 61)	/* pri. error during write */
-#define	UEAFSR_P_DRD	(1UL << 62)	/* pri. error during read */
+#define	UEAFSR_BLK	(1UL << 23)	/* Error caused by block transaction. */
+#define	UEAFSR_P_DTE	(1UL << 56)	/* Pri. DVMA translation error. */
+#define	UEAFSR_S_DTE	(1UL << 57)	/* Sec. DVMA translation error. */
+#define	UEAFSR_S_DWR	(1UL << 58)	/* Sec. error during DVMA write. */
+#define	UEAFSR_S_DRD	(1UL << 59)	/* Sec. error during DVMA read. */
+#define	UEAFSR_S_PIO	(1UL << 60)	/* Sec. error during PIO access. */
+#define	UEAFSR_P_DWR	(1UL << 61)	/* Pri. error during DVMA write. */
+#define	UEAFSR_P_DRD	(1UL << 62)	/* Pri. error during DVMA read. */
+#define	UEAFSR_P_PIO	(1UL << 63)	/* Pri. error during PIO access. */
+
+/* Correctable error asynchronous fault status registers */
+#define	CEAFSR_BLK	(1UL << 23)	/* Error caused by block transaction. */
+#define	CEAFSR_S_DWR	(1UL << 58)	/* Sec. error caused by DVMA write. */
+#define	CEAFSR_S_DRD	(1UL << 59)	/* Sec. error caused by DVMA read. */
+#define	CEAFSR_S_PIO	(1UL << 60)	/* Sec. error caused by PIO access. */
+#define	CEAFSR_P_DWR	(1UL << 61)	/* Pri. error caused by DVMA write. */
+#define	CEAFSR_P_DRD	(1UL << 62)	/* Pri. error caused by DVMA read. */
+#define	CEAFSR_P_PIO	(1UL << 63)	/* Pri. error caused by PIO access. */
+
+#define	CEAFSR_ERRMASK							\
+	(CEAFSR_P_PIO | CEAFSR_P_DRD | CEAFSR_P_DWR |			\
+	 CEAFSR_S_PIO | CEAFSR_S_DRD | CEAFSR_S_DWR)
 
 /* Definitions for the target address space register. */
 #define	PCITAS_ADDR_SHIFT	29


More information about the freebsd-sparc64 mailing list