Re: [Intel AlderLake] Read&Write files to FAT32 or UFS partition cause data corrupt due to P-Core&E-Core

From: Mike Karels <mike_at_karels.net>
Date: Mon, 18 Sep 2023 18:27:55 UTC
On 18 Sep 2023, at 10:38, Michael Butler wrote:

> On 8/8/23 13:50, Michael Butler wrote:
>> On 8/8/23 10:56, Tomoaki AOKI wrote:
>>> On Tue, 8 Aug 2023 17:02:32 +0300
>>> Konstantin Belousov <kostikbel@gmail.com> wrote:
>>
>>   [ .. snip .. ]
>>
>>>> The workaround is switched on automatically, when kernel detects 'small cores'
>>>> reported by CPUID.
>>>
>>> If I read the code correctly, vm.pmap.pcid_invlpg_workaround
>>> (precicely, the corresponding variable) is set to non-zero when the
>>> workaround is enabled. Not sure it was detected correctly at the
>>> original reporter's environment, but forcibly setting the tunable to 1
>>> didn't reported to help sufficiently.
>>> Currently, only setting tunable vm.pmap.pcid_enabled to 0 could help.
>>
>> I'm seeing similar stability problems on an N95-based device. This too is an Alderlake-N device with only E-cores although I'm running it with a compilation with CPUTYPE=tremont .. from an older, verbose start-up ..
>>
>> PPIM 0: PA=0x4000000000, VA=0xffffffff82710000, size=0x1d5000, mode=0x1
>> pmap: large map 8 PML4 slots (4096 GB)
>> VT(efifb): resolution 800x600
>> Preloaded elf kernel "/boot/kernel.new/kernel" at 0xffffffff8234e000.
>> Preloaded boot_entropy_cache "/boot/entropy" at 0xffffffff82357d08.
>> Preloaded cpu_microcode "/boot/firmware/intel-ucode.bin" at 0xffffffff82357d60.
>> Preloaded hostuuid "/etc/hostid" at 0xffffffff82357dc0.
>> Preloaded TSLOG data "TSLOG" at 0xffffffff82357e10.
>> CPU: Intel(R) N95 (1689.60-MHz K8-class CPU)
>>    Origin="GenuineIntel"  Id=0xb06e0  Family=0x6  Model=0xbe  Stepping=0
>>
>> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>>
>> Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
>>    AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>>    AMD Features2=0x121<LAHF,ABM,Prefetch>
>>    Structured Extended Features=0x239ca7eb<FSGSBASE,TSCADJ,BMI1,AVX2,FDPEXC,SMEP,BMI2,ERMS,INVPCID,NFPUSG,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PROCTRACE,SHA>
>>    Structured Extended Features2=0x98c007bc<UMIP,PKU,OSPKE,WAITPKG,GFNI,VAES,VPCLMULQDQ,RDPID,MOVDIRI,MOVDIR64B>
>>    Structured Extended Features3=0xfc184410<FSRM,MD_CLEAR,IBT,IBPB,STIBP,L1DFL,ARCH_CAP,CORE_CAP,SSBD>
>>    XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
>>    IA32_ARCH_CAPS=0x180fd6b<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO,TAA_NO>
>>    VT-x: Basic Features=0x3da0500<SMM,INS/OUTS,TRUE>
>>          Pin-Based Controls=0xff<ExtINT,NMI,VNMI,PreTmr,PostIntr>
>>          Primary Processor Controls=0xfffbfffe<INTWIN,TSCOff,HLT,INVLPG,MWAIT,RDPMC,RDTSC,CR3-LD,CR3-ST,CR8-LD,CR8-ST,TPR,NMIWIN,MOV-DR,IO,IOmap,MTF,MSRmap,MONITOR,PAUSE>
>>          Secondary Processor Controls=0x75d7fff<APIC,EPT,DT,RDTSCP,x2APIC,VPID,WBINVD,UG,APIC-reg,VID,PAUSE-loop,RDRAND,INVPCID,VMFUNC,VMCS,XSAVES>
>>          Exit Controls=0x3da0500<PAT-LD,EFER-SV,PTMR-SV>
>>          Entry Controls=0x3da0500
>>          EPT Features=0x6f34141<XO,PW4,UC,WB,2M,1G,INVEPT,AD,single,all>
>>          VPID Features=0xf01<INVVPID,individual,single,all,single-globals>
>>    TSC: P-state invariant, performance statistics
>> 64-Byte prefetching
>> L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
>> real memory  = 17179869184 (16384 MB)
>> Physical memory chunk(s):
>> 0x0000000000010000 - 0x000000000009dfff, 581632 bytes (142 pages)
>> 0x000000000009f000 - 0x000000000009ffff, 4096 bytes (1 pages)
>> 0x0000000000100000 - 0x000000005fffffff, 1609564160 bytes (392960 pages)
>> 0x0000000062401000 - 0x000000007264dfff, 270848000 bytes (66125 pages)
>> 0x0000000075fff000 - 0x0000000075ffffff, 4096 bytes (1 pages)
>> 0x0000000100001000 - 0x0000000462497fff, 14533881856 bytes (3548311 pages)
>> 0x000000047fa00000 - 0x000000047fb68fff, 1478656 bytes (361 pages)
>> avail memory = 16363008000 (15604 MB)
>> CPU microcode: updated from 0xc to 0x10
>
> With the most recent microcode update, this device reports ..
>
> CPU microcode: updated from 0xc to 0x11
>
>  .. and is now stable with vm.pmap.pcid_enabled=0, vm.pmap.pcid_invlpg_workaround=1, and CPUTYPE?=alderlake set in /etc/make.conf over multiple full system builds.
>
> I have not tested with vm.pmap.pcid_invlpg_workaround=0.

I believe that vm.pmap.pcid_invlpg_workaround does not matter if
vm.pmap.pcid_enabled=0.  Enabling the workaround or disabling pcid should
be basically the same for this CPU, so I don't understand why that isn't
true.  It might be interesting to test with pcid enabled with the new
microcode, although I don't see why that would affect the results (pcid
should still not be used on any CPU).

The CPUTYPE for the compiler should not affect the pcid vm issues, just
change the optimization by the compiler.

		Mike

>> On start-up, vm.pmap.pcid_invlpg_workaround=1 but seemingly random faults still occurred under load, for example, 'make buildworld'. Apparent misreads of source-files resulting in syntax errors were the most common symptom. Compilation reattempts (mostly) succeed.
>>
>> Initially, I put this down to an inadequate power-supply but setting vm.pmap.pcid_enabled=0 seems to have stabilised it.
>>
>> I guess there's another dragon in there .. :-(
>>
>>      Michael