From nobody Tue Jun 21 16:13:28 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id B35EF873B72 for ; Tue, 21 Jun 2022 16:13:33 +0000 (UTC) (envelope-from ler@lerctr.org) Received: from thebighonker.lerctr.org (thebighonker.lerctr.org [IPv6:2602:fcdb:0:10:7ae3:b5ff:fe1b:23b4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "*.lerctr.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4LSBQN5rV9z4skg for ; Tue, 21 Jun 2022 16:13:32 +0000 (UTC) (envelope-from ler@lerctr.org) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lerctr.org; s=ler2019; h=Content-Transfer-Encoding:Content-Type:Message-ID:References: In-Reply-To:Subject:Cc:To:From:Date:MIME-Version:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=qBnj6O12NaONcK8PTADcHr3+70/M4LRT+vhNhkRaWlk=; b=h9QK2fp1/e3mUFpQNWZRJrlOsP IGMqAxkGDrOldtQEL+QIOWErnQ8/dO+e36DCar3T2UeS5muMOCxhYpip+oTWDfY4f+tNny2bcEWGP Bv1W97RfXz6y6j7wADBVDVhoC0OcQzXrb3KsCY1P4wpplqIHez1gZLOyKkKk/t8ro52ic8ea87k+Y lXV5eKhaD/ZXJ1LHWoKY3eyrCO9jxmJ4RwSTcRfU9g5MWQmnK6rfWhewhgzGmOxO70WtkHTwnlrAy lXDfqZX/0+2hN4iEaWOd+adknzTvKDmi7oxXK0SBpm0MuG9hkdrrwIhIb+k4tOg7+D59Z61Sqy+R2 tH+nE2WQ==; Received-SPF: pass (thebighonker.lerctr.org: domain of lerctr.org designates 2602:fcdb:0:10:7ae3:b5ff:fe1b:23b4 as permitted sender) client-ip=2602:fcdb:0:10:7ae3:b5ff:fe1b:23b4; envelope-from=ler@lerctr.org; helo=webmail.lerctr.org; Received: from thebighonker.lerctr.org ([2602:fcdb:0:10:7ae3:b5ff:fe1b:23b4]:45605 helo=webmail.lerctr.org) by thebighonker.lerctr.org with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.95 (FreeBSD)) (envelope-from ) id 1o3gVM-000EfP-Vf; Tue, 21 Jun 2022 11:13:29 -0500 Received: from 2600:1700:210:b18f:7139:7834:f65d:718c by webmail.lerctr.org with HTTP (HTTP/1.1 POST); Tue, 21 Jun 2022 11:13:28 -0500 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Date: Tue, 21 Jun 2022 11:13:28 -0500 From: Larry Rosenman To: "Rodney W. Grimes" Cc: Ultima , Freebsd current Subject: Re: MCE: Does this look possibly like a slot issue? In-Reply-To: <202206211606.25LG6Out053747@gndrsh.dnsmgr.net> References: <202206211606.25LG6Out053747@gndrsh.dnsmgr.net> Message-ID: <56938c90ef717a0d29566f81353c1295@lerctr.org> X-Sender: ler@lerctr.org Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4LSBQN5rV9z4skg X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=lerctr.org header.s=ler2019 header.b=h9QK2fp1; dmarc=pass (policy=none) header.from=lerctr.org; spf=pass (mx1.freebsd.org: domain of ler@lerctr.org designates 2602:fcdb:0:10:7ae3:b5ff:fe1b:23b4 as permitted sender) smtp.mailfrom=ler@lerctr.org X-Spamd-Result: default: False [-3.00 / 15.00]; RCVD_TLS_LAST(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[lerctr.org:s=ler2019]; FREEFALL_USER(0.00)[ler]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[lerctr.org:+]; DMARC_POLICY_ALLOW(-0.50)[lerctr.org,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; MLMMJ_DEST(0.00)[freebsd-current]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:55103, ipnet:2602:fcdb::/36, country:US]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N Looks like it might be just that, Rodney: root@freenas[~]# mcelog Hardware event. This is not a software error. MCE 0 CPU 14 BANK 8 TSC 525efc019bb6 MISC ac29890200040083 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 1 CPU 14 BANK 8 TSC 52a513d27f2c MISC ac29890200041083 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 2 CPU 14 BANK 8 TSC 53d8cf2ceb4a MISC ac29890200040582 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 3 CPU 14 BANK 8 TSC 5e4dae622cb6 MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 4 CPU 14 BANK 8 TSC 5eea68fdad4e MISC ac29890200041784 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 84 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 5 CPU 14 BANK 8 TSC 5eea6e0bbce0 MISC ac29890200044000 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 6 CPU 12 BANK 8 TSC 5f6cbe9ef2bc MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 20 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 7 CPU 14 BANK 8 TSC 64ba63c66e52 MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 8 CPU 14 BANK 8 TSC 659878c17622 MISC ac29890200040282 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 9 CPU 14 BANK 8 TSC 66b71c1dccf6 MISC ac29890200040183 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 10 CPU 14 BANK 8 TSC 6be0988610ce MISC ac29890200040682 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 11 CPU 14 BANK 8 TSC 6be0995926f8 MISC ac29890200044000 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 root@freenas[~]# mcelog --dmi Hardware event. This is not a software error. MCE 0 CPU 14 BANK 8 TSC 525efc019bb6 MISC ac29890200040083 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 WARNING: SMBIOS data is often unreliable. Take with a grain of salt! DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 1 CPU 14 BANK 8 TSC 52a513d27f2c MISC ac29890200041083 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 2 CPU 14 BANK 8 TSC 53d8cf2ceb4a MISC ac29890200040582 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 3 CPU 14 BANK 8 TSC 5e4dae622cb6 MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 4 CPU 14 BANK 8 TSC 5eea68fdad4e MISC ac29890200041784 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 84 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 5 CPU 14 BANK 8 TSC 5eea6e0bbce0 MISC ac29890200044000 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 6 CPU 12 BANK 8 TSC 5f6cbe9ef2bc MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 20 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 7 CPU 14 BANK 8 TSC 64ba63c66e52 MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 8 CPU 14 BANK 8 TSC 659878c17622 MISC ac29890200040282 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 9 CPU 14 BANK 8 TSC 66b71c1dccf6 MISC ac29890200040183 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 10 CPU 14 BANK 8 TSC 6be0988610ce MISC ac29890200040682 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 11 CPU 14 BANK 8 TSC 6be0995926f8 MISC ac29890200044000 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE root@freenas[~]# On 06/21/2022 11:06 am, Rodney W. Grimes wrote: >> >> >> Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using >> all >> the memory. > > Depending on the results of that one thing that is often overlooked > when trying to trouble shoot memory systems in modern Intel systems > is the fact that the DIMM now talks directly to the CPU chip that > has the memory controller built into it. THUS these "slot" related > ECC/Parity/blowup errors can actually be the CPU and/or the CPU > socket and/or the seating of the CPU in the socket. > > So if the error sticks with the DIMM slot and not the DIMM > module the next thing I would try would be a CPU chip reseat, > including a good inspection of the socket for for a damaged > pin. Also look at the lands on the CPU chip itself, and you > can even try swaping CPU chips to see if it follows the > CPU or the socket, much as you do with a DIMM. > > >> >> On 06/20/2022 7:59 pm, Larry Rosenman wrote: >> >> > SuperMicro X8DTN+ >> > >> > 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU >> > E5645 @ 2.40GHz (2400.20-MHz K8-class CPU) >> > >> > I'll bring it down and swap DIMMS around >> > >> > On 06/20/2022 7:57 pm, Ultima wrote: >> > >> > Hey Larry, >> > >> > One red flag I am seeing is that the error is being produced on >> > the same CPU/bank with each error you have provided so far. >> > >> > Can you try and follow my original recommendation and swap >> > currently installed DIMM with the problem DIMM slot and see >> > if anything changes? >> > >> > Can you also provide the motherboard model? Also, do you >> > have multiple CPUs installed in this system? >> > >> > Best regards, >> > Richard Gallamore >> > >> > On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman wrote: >> > >> > Yes and Yes. >> > >> > On 06/20/2022 7:37 pm, Ultima wrote: >> > >> > Are you sure that the module you replaced it with was good? >> > Are you sure you replaced the correct module? >> > >> > Best regards, >> > Richard Gallamore >> > >> > On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman wrote: >> > >> > I'm seeing them constantly: >> > >> > root@freenas[~]# mcelog --dmi >> > Hardware event. This is not a software error. >> > MCE 0 >> > CPU 22 BANK 8 TSC 20aab486464a >> > MISC ac29890200046444 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 44 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > WARNING: SMBIOS data is often unreliable. Take with a grain of salt! >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 1 >> > CPU 22 BANK 8 TSC 296dfcc82582 >> > MISC ac29890200041381 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 81 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 2 >> > CPU 22 BANK 8 TSC 2a5604a6a070 >> > MISC ac29890200044281 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory ECC error occurred during scrub >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 81 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 88000040000200cf MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > Hardware event. This is not a software error. >> > MCE 3 >> > CPU 22 BANK 8 TSC 31e141418eb8 >> > MISC ac29890200046a4a ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 4a >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 4 >> > CPU 22 BANK 8 TSC 3a014afee106 >> > MISC ac29890200046646 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 46 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 5 >> > CPU 22 BANK 8 TSC 41d1dbef1a6a >> > MISC ac29890200046141 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 41 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 6 >> > CPU 22 BANK 8 TSC 4a1b1ecef446 >> > MISC ac29890200046a4a ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 4a >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 7 >> > CPU 22 BANK 8 TSC 527bc27db776 >> > MISC ac29890200040386 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 86 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 8 >> > CPU 22 BANK 8 TSC 5aa4ecdd795a >> > MISC ac29890200046646 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 46 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > root@freenas[~]# >> > >> > and I replaced the DIMM yesterday :( >> > >> > On 06/20/2022 7:19 pm, Ultima wrote: >> > >> > Hey Larry, >> > >> > It is possible it's the motherboard itself, but it's rare. The way I >> > would determine this is to swap the DIMM module with another >> > populated slot on the motherboard and see if the error migrated >> > to the new slot or not. Also, this error doesn't necessarily mean >> > there is a problem that needs to be addressed. If you have been >> > running the system for many months and you see ECC errors a >> > handful of times, it can probably be safely ignored. >> > >> > Best regards, >> > Richard Gallamore >> > >> > On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman wrote: >> > I've gotten a BUNCH of these on my TrueNAS server. I've replaced this >> > DIMM a couple of times, and still the MCE's continue. >> > Is it possible it's Motherboard slot issue? >> > >> > Hardware event. This is not a software error. >> > MCE 8 >> > CPU 22 BANK 8 TSC 5aa4ecdd795a >> > MISC ac29890200046646 ADDR ee2f6e800 >> > TIME 1655762472 Mon Jun 20 17:01:12 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 46 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > >> > -- >> > Larry Rosenman http://www.lerctr.org/~ler >> > Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 >> >> -- >> Larry Rosenman http://www.lerctr.org/~ler >> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 >> >> -- >> Larry Rosenman http://www.lerctr.org/~ler >> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 >> >> -- >> Larry Rosenman http://www.lerctr.org/~ler >> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 >> >> -- >> Larry Rosenman http://www.lerctr.org/~ler >> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: ler@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106