Re: MCE: Does this look possibly like a slot issue?

From: Chris <bsd-lists_at_bsdforge.com>
Date: Tue, 21 Jun 2022 20:27:53 UTC
On 2022-06-21 12:23, Larry Rosenman wrote:
> On 06/21/2022 1:23 pm, Chris wrote:
>> On 2022-06-20 17:23, Larry Rosenman wrote:
>>> I'm seeing them constantly:
>> FWIW it looks like a sync(ing) problem between your
>> RAM && CPU cache. Are are your clocks set correctly
>> for your CPU && RAM? Is your CPU too hot? Is the CPU
>> cache ECC?
>>> 
>>> root@freenas[~]# mcelog --dmi
> 
> [snip]
> 
> Hrm.  IIRC all the BIOS parameters are default (I could be mistaken).  It's 
> a
> SuperMicro X8DTN+ motherboard with:
> CPU: Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz (2400.22-MHz K8-class 
> CPU)
>   Origin="GenuineIntel"  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2
> 
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> 
> Features2=0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI>
>   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>   AMD Features2=0x1<LAHF>
>   Structured Extended Features3=0x9c000000<IBPB,STIBP,L1DFL,SSBD>
>   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
>   TSC: P-state invariant, performance statistics
> real memory  = 77309411328 (73728 MB)
> avail memory = 75186962432 (71703 MB)
> (2 packages, 6 core, 12-threads each) and 18 4GB sticks.
> this ONE slot seems to be a problem.
> 
> How would you recommend looking for an issue modulo pulling the 2 cpu 
> packages?
When I ran into these errors it turned out to be a hot CPU as I recall. While 
I'm familiar with
the hardware your using. I have no history with *your* equipment. The first 2 
things I'd do
given ECC is so sensitive, is replace/swap the PSU with a known good one. The 
CPU(s) should
be re-seated && re-greased. The fans operate as intended? At that point a 
long session with
sysutils/memtest86 or a buildworld session should tell you if everything is 
AOK. Frankly; as to
testing memory; working with a single stick at a time would be more 
conclusive resulting
in a shorter time to conclusion.

HTH

Chris