Re: MCE: Does this look possibly like a slot issue?

From: Larry Rosenman <ler_at_lerctr.org>
Date: Tue, 21 Jun 2022 00:41:05 UTC

Yes and Yes.

On 06/20/2022 7:37 pm, Ultima wrote:

> Are you sure that the module you replaced it with was good?
> Are you sure you replaced the correct module?
> 
> Best regards,
> Richard Gallamore
> 
> On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman <ler@lerctr.org> wrote:
> 
> I'm seeing them constantly:
> 
> root@freenas[~]# mcelog --dmi
> Hardware event. This is not a software error.
> MCE 0
> CPU 22 BANK 8 TSC 20aab486464a
> MISC ac29890200046444 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 44
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c0000400001009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 1
> CPU 22 BANK 8 TSC 296dfcc82582
> MISC ac29890200041381 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 81
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c0000400001009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 2
> CPU 22 BANK 8 TSC 2a5604a6a070
> MISC ac29890200044281
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory ECC error occurred during scrub
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 81
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 88000040000200cf MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> Hardware event. This is not a software error.
> MCE 3
> CPU 22 BANK 8 TSC 31e141418eb8
> MISC ac29890200046a4a ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 4a
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c0000400001009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 4
> CPU 22 BANK 8 TSC 3a014afee106
> MISC ac29890200046646 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 46
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c0000400001009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 5
> CPU 22 BANK 8 TSC 41d1dbef1a6a
> MISC ac29890200046141 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 41
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c0000400001009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 6
> CPU 22 BANK 8 TSC 4a1b1ecef446
> MISC ac29890200046a4a ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 4a
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c0000400001009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 7
> CPU 22 BANK 8 TSC 527bc27db776
> MISC ac29890200040386 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 86
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c0000400001009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 8
> CPU 22 BANK 8 TSC 5aa4ecdd795a
> MISC ac29890200046646 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 46
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c0000400001009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> root@freenas[~]#
> 
> and I replaced the DIMM yesterday :(
> 
> On 06/20/2022 7:19 pm, Ultima wrote:
> 
> Hey Larry,
> 
> It is possible it's the motherboard itself, but it's rare. The way I
> would determine this is to swap the DIMM module with another
> populated slot on the motherboard and see if the error migrated
> to the new slot or not. Also, this error doesn't necessarily mean
> there is a problem that needs to be addressed. If you have been
> running the system for many months and you see ECC errors a
> handful of times, it can probably be safely ignored.
> 
> Best regards,
> Richard Gallamore
> 
> On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman <ler@lerctr.org> wrote: 
> I've gotten a BUNCH of these on my TrueNAS server.  I've replaced this
> DIMM a couple of times, and still the MCE's continue.
> Is it possible it's Motherboard slot issue?
> 
> Hardware event. This is not a software error.
> MCE 8
> CPU 22 BANK 8 TSC 5aa4ecdd795a
> MISC ac29890200046646 ADDR ee2f6e800
> TIME 1655762472 Mon Jun 20 17:01:12 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 46
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c0000400001009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> 
> --
> Larry Rosenman                     http://www.lerctr.org/~ler
> Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106