kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000

Anton Shterenlikht mexas at bristol.ac.uk
Mon Nov 29 21:57:27 UTC 2010


On Mon, Nov 29, 2010 at 07:40:37PM +0100, Marius Strobl wrote:
> On Mon, Nov 29, 2010 at 09:32:31AM +0000, Anton Shterenlikht wrote:
> > On blade1500 silver 9.0-CURRENT #0 r212302 I got a panic,
> > which was preceded by these messages in /var/log/messages:
> > 
> > 
> > Nov 28 22:59:13 mech-anton240 ntpd[860]: time reset +0.313838 s
> > Nov 28 23:21:39 mech-anton240 ntpd[860]: time reset +0.354851 s
> > Nov 28 23:40:17 mech-anton240 ntpd[860]: time reset +0.319586 s
> > Nov 29 00:02:51 mech-anton240 ntpd[860]: time reset +0.357852 s
> > Nov 29 00:21:34 mech-anton240 ntpd[860]: time reset +0.327949 s
> > Nov 29 00:42:54 mech-anton240 ntpd[860]: time reset +0.347609 s
> > Nov 29 01:01:46 mech-anton240 ntpd[860]: time reset +0.329297 s
> > Nov 29 01:18:55 mech-anton240 ntpd[860]: time reset +0.317517 s
> > Nov 29 01:42:21 mech-anton240 ntpd[860]: time reset +0.354540 s
> > Nov 29 02:02:14 mech-anton240 ntpd[860]: time reset +0.344071 s
> > Nov 29 02:10:26 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:26 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:26 mech-anton240 kernel: corrected Epcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:26 mech-anton240 kernel: CC error
> > Nov 29 02:10:26 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:36 mech-anton240 last message repeated 40137 times
> > Nov 29 02:10:36 mech-anton240 kernel: FAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:36 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:42 mech-anton240 last message repeated 26464 times
> > Nov 29 02:10:42 mech-anton240 kernel: FAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:42 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:42 mech-anton240 last message repeated 14 times
> > Nov 29 02:10:42 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000FAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:42 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:45 mech-anton240 last message repeated 12750 times
> > Nov 29 02:10:46 mech-anton240 kernel: FAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:10:46 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:11:06 mech-anton240 last message repeated 73851 times
> > Nov 29 02:11:06 mech-anton240 kernel: pcib1: correctable DMA error AFFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:11:06 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:11:37 mech-anton240 last message repeated 72138 times
> > Nov 29 02:13:38 mech-anton240 last message repeated 180714 times
> > Nov 29 02:20:33 mech-anton240 last message repeated 623033 times
> > Nov 29 02:20:33 mech-anton240 ntpd[860]: time reset +0.317476 s
> > Nov 29 02:20:33 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:21:03 mech-anton240 last message repeated 44694 times
> > Nov 29 02:23:03 mech-anton240 last message repeated 179765 times
> > Nov 29 02:33:03 mech-anton240 last message repeated 900956 times
> > Nov 29 02:41:41 mech-anton240 last message repeated 774749 times
> > Nov 29 02:41:41 mech-anton240 ntpd[860]: time reset +0.338347 s
> > Nov 29 02:41:41 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 02:42:12 mech-anton240 last message repeated 45005 times
> > Nov 29 02:44:13 mech-anton240 last message repeated 180767 times
> > Nov 29 02:54:14 mech-anton240 last message repeated 901067 times
> > Nov 29 03:04:01 mech-anton240 last message repeated 953527 times
> > Nov 29 03:04:01 mech-anton240 ntpd[860]: time reset +0.352855 s
> > Nov 29 03:04:02 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 03:04:33 mech-anton240 last message repeated 59600 times
> > Nov 29 03:06:34 mech-anton240 last message repeated 210676 times
> > Nov 29 03:16:35 mech-anton240 last message repeated 901966 times
> > Nov 29 03:21:49 mech-anton240 last message repeated 473275 times
> > Nov 29 03:21:49 mech-anton240 ntpd[860]: time reset +0.330125 s
> > Nov 29 03:21:49 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000
> > Nov 29 03:22:20 mech-anton240 last message repeated 44963 times
> > Nov 29 03:24:21 mech-anton240 last message repeated 182191 times
> > Nov 29 09:18:15 mech-anton240 syslogd: kernel boot file is /boot/kernel/kernel
> > 
> > The panic was (copied by hand):
> > 
> > panic: pcib1: JBus error 0.
> > 
> > If it happens again, I'll post the full bt.
> > 
> > Is /var/log/messages indicative of a hardware failure?
> > I'm also intrigued by ntpd time reset preceding most DMA errors.
> > 
> 
> This looks like RAM beginning to fail (note there's a "corrected ECC
> error" message intermixed with a "correctable DMA error" one), though
> it also could be just a problem with the connection and reseating the
> modules might help.

Marius, many thanks
anton


-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423


More information about the freebsd-sparc64 mailing list