Re: MCE: Does this look possibly like a slot issue?

From: Larry Rosenman <ler_at_lerctr.org>
Date: Tue, 21 Jun 2022 00:23:56 UTC

I'm seeing them constantly:

root@freenas[~]# mcelog --dmi
Hardware event. This is not a software error.
MCE 0
CPU 22 BANK 8 TSC 20aab486464a
MISC ac29890200046444 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 44
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 1
CPU 22 BANK 8 TSC 296dfcc82582
MISC ac29890200041381 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 81
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 2
CPU 22 BANK 8 TSC 2a5604a6a070
MISC ac29890200044281
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory ECC error occurred during scrub
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 81
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 88000040000200cf MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
Hardware event. This is not a software error.
MCE 3
CPU 22 BANK 8 TSC 31e141418eb8
MISC ac29890200046a4a ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 4a
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 4
CPU 22 BANK 8 TSC 3a014afee106
MISC ac29890200046646 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 46
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 5
CPU 22 BANK 8 TSC 41d1dbef1a6a
MISC ac29890200046141 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 41
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 6
CPU 22 BANK 8 TSC 4a1b1ecef446
MISC ac29890200046a4a ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 4a
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 7
CPU 22 BANK 8 TSC 527bc27db776
MISC ac29890200040386 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 86
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 8
CPU 22 BANK 8 TSC 5aa4ecdd795a
MISC ac29890200046646 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 46
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
root@freenas[~]#

and I replaced the DIMM yesterday :(

On 06/20/2022 7:19 pm, Ultima wrote:

> Hey Larry,
> 
> It is possible it's the motherboard itself, but it's rare. The way I
> would determine this is to swap the DIMM module with another
> populated slot on the motherboard and see if the error migrated
> to the new slot or not. Also, this error doesn't necessarily mean
> there is a problem that needs to be addressed. If you have been
> running the system for many months and you see ECC errors a
> handful of times, it can probably be safely ignored.
> 
> Best regards,
> Richard Gallamore
> 
> On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman <ler@lerctr.org> wrote:
> 
>> I've gotten a BUNCH of these on my TrueNAS server.  I've replaced this
>> DIMM a couple of times, and still the MCE's continue.
>> Is it possible it's Motherboard slot issue?
>> 
>> Hardware event. This is not a software error.
>> MCE 8
>> CPU 22 BANK 8 TSC 5aa4ecdd795a
>> MISC ac29890200046646 ADDR ee2f6e800
>> TIME 1655762472 Mon Jun 20 17:01:12 2022
>> MCG status:
>> Memory read ECC error
>> Memory corrected error count (CORE_ERR_CNT): 1
>> Memory transaction Tracker ID (RTId): 46
>> Memory DIMM ID of error: 0
>> Memory channel ID of error: 1
>> Memory ECC syndrome: ac298902
>> STATUS 8c0000400001009f MCGSTATUS 0
>> MCGCAP 1c09 APICID 34 SOCKETID 0
>> CPUID Vendor Intel Family 6 Model 44 Step 2
>> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
>> Device Locator: P2-DIMM2C
>> Bank Locator: BANK14
>> Manufacturer: Hyundai
>> Serial Number: 40F3C20F
>> Asset Tag:
>> Part Number: HMT151R7BFR4C-H9
>> 
>> --
>> Larry Rosenman                     http://www.lerctr.org/~ler
>> Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
>> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106