correctable DMA error AFAR
Thomas Moestl
t.moestl at tu-bs.de
Mon Jul 21 15:47:12 PDT 2003
On Mon, 2003/07/21 at 15:44:36 -0400, Chris Jackman wrote:
> Error messages:
>
> pcib0: correctable DMA error AFAR 0x476d6140 AFSR 0x40e600003f800000
> and
> pcib0: correctable DMA error AFAR 0x40adbc40 AFSR 0x40c400003f800000
These signal correctable ECC errors during a DVMA read
transaction. The differences in the AFSR values indicate different ECC
syndromes.
> My e250 has locked up twice in the last few weeks with these
> error messages. The error gets repeated over and over
> again on the serial console, and I can't do anything to the
> box except power cycle it.
This interrupt is informational only, and the documentation states
that no further cleanup is required. We should probably clear the
error bits in the status register however, since this looks like the
interrupt being triggered again and again when any bits are still
set. The manual is a bit ambiguous on that point, but clearing the
bits is desirable anyway since it improves error reporting.
The attached patch implements this; can you please try it and report
how well it behaved on the next ECC error?
Thanks,
- Thomas
--
Thomas Moestl <t.moestl at tu-bs.de> http://www.tu-bs.de/~y0015675/
<tmm at FreeBSD.org> http://people.FreeBSD.org/~tmm/
PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C
-------------- next part --------------
Index: sparc64/pci/psycho.c
===================================================================
RCS file: /vol/ncvs/src/sys/sparc64/pci/psycho.c,v
retrieving revision 1.41
diff -u -r1.41 psycho.c
--- sparc64/pci/psycho.c 1 Jul 2003 15:52:06 -0000 1.41
+++ sparc64/pci/psycho.c 21 Jul 2003 22:41:12 -0000
@@ -745,12 +745,14 @@
struct psycho_softc *sc = (struct psycho_softc *)arg;
u_int64_t afar, afsr;
- PSYCHO_WRITE8(sc, PSR_CE_INT_CLR, 0);
afar = PSYCHO_READ8(sc, PSR_CE_AFA);
afsr = PSYCHO_READ8(sc, PSR_CE_AFS);
/* It's correctable. Dump the regs and continue. */
device_printf(sc->sc_dev, "correctable DMA error AFAR %#lx "
"AFSR %#lx\n", (u_long)afar, (u_long)afsr);
+ /* Clear the error bits that we caught. */
+ PSYCHO_WRITE8(sc, PSR_CE_AFS, afsr & CEAFSR_ERRMASK);
+ PSYCHO_WRITE8(sc, PSR_CE_INT_CLR, 0);
}
static void
Index: sparc64/pci/psychoreg.h
===================================================================
RCS file: /vol/ncvs/src/sys/sparc64/pci/psychoreg.h,v
retrieving revision 1.6
diff -u -r1.6 psychoreg.h
--- sparc64/pci/psychoreg.h 6 Jan 2003 16:51:06 -0000 1.6
+++ sparc64/pci/psychoreg.h 21 Jul 2003 22:36:03 -0000
@@ -232,13 +232,28 @@
#define PCICTL_6ENABLE 0x000000000000003f /* enable 6 PCI slots */
/* Uncorrectable error asynchronous fault status registers */
-#define UEAFSR_BLK (1UL << 22) /* pri. error caused by read */
-#define UEAFSR_P_DTE (1UL << 56) /* pri. DMA translation error */
-#define UEAFSR_S_DTE (1UL << 57) /* sec. DMA translation error */
-#define UEAFSR_S_DWR (1UL << 58) /* sec. error during write */
-#define UEAFSR_S_DRD (1UL << 59) /* sec. error during read */
-#define UEAFSR_P_DWR (1UL << 61) /* pri. error during write */
-#define UEAFSR_P_DRD (1UL << 62) /* pri. error during read */
+#define UEAFSR_BLK (1UL << 23) /* Error caused by block transaction. */
+#define UEAFSR_P_DTE (1UL << 56) /* Pri. DVMA translation error. */
+#define UEAFSR_S_DTE (1UL << 57) /* Sec. DVMA translation error. */
+#define UEAFSR_S_DWR (1UL << 58) /* Sec. error during DVMA write. */
+#define UEAFSR_S_DRD (1UL << 59) /* Sec. error during DVMA read. */
+#define UEAFSR_S_PIO (1UL << 60) /* Sec. error during PIO access. */
+#define UEAFSR_P_DWR (1UL << 61) /* Pri. error during DVMA write. */
+#define UEAFSR_P_DRD (1UL << 62) /* Pri. error during DVMA read. */
+#define UEAFSR_P_PIO (1UL << 63) /* Pri. error during PIO access. */
+
+/* Correctable error asynchronous fault status registers */
+#define CEAFSR_BLK (1UL << 23) /* Error caused by block transaction. */
+#define CEAFSR_S_DWR (1UL << 58) /* Sec. error caused by DVMA write. */
+#define CEAFSR_S_DRD (1UL << 59) /* Sec. error caused by DVMA read. */
+#define CEAFSR_S_PIO (1UL << 60) /* Sec. error caused by PIO access. */
+#define CEAFSR_P_DWR (1UL << 61) /* Pri. error caused by DVMA write. */
+#define CEAFSR_P_DRD (1UL << 62) /* Pri. error caused by DVMA read. */
+#define CEAFSR_P_PIO (1UL << 63) /* Pri. error caused by PIO access. */
+
+#define CEAFSR_ERRMASK \
+ (CEAFSR_P_PIO | CEAFSR_P_DRD | CEAFSR_P_DWR | \
+ CEAFSR_S_PIO | CEAFSR_S_DRD | CEAFSR_S_DWR)
/* Definitions for the target address space register. */
#define PCITAS_ADDR_SHIFT 29
More information about the freebsd-sparc64
mailing list