RE: [Intel AlderLake] Read&Write files to FAT32 or UFS partition cause data corrupt due to P-Core&E-Core

From: Chen, Alvin W <Weike.Chen_at_Dell.com>
Date: Mon, 07 Mar 2022 07:57:32 UTC
Hi guys,
Any progresses for this issue?



Regards,
Alvin Chen
Dell | Comercial Client Group
office +86-10-82862506, fax +86-10-82861554, Dell Lync 8672506 weike_chen@dell.com


Internal Use - Confidential

-----Original Message-----
From: Konstantin Belousov <kostikbel@gmail.com> 
Sent: 2022年2月24日 9:24
To: Alexander Motin
Cc: Mike Karels; Tomoaki AOKI; Chen, Alvin W; freebsd-current@freebsd.org
Subject: Re: [Intel AlderLake] Read&Write files to FAT32 or UFS partition cause data corrupt due to P-Core&E-Core


[EXTERNAL EMAIL] 

On Wed, Feb 23, 2022 at 12:25:24PM -0500, Alexander Motin wrote:
> On 22.02.2022 19:00, Konstantin Belousov wrote:
> > On Tue, Feb 22, 2022 at 06:53:09PM -0500, Alexander Motin wrote:
> > > On 22.02.2022 18:41, Konstantin Belousov wrote:
> > > > On Tue, Feb 22, 2022 at 06:38:24PM -0500, Alexander Motin wrote:
> > > > > On 22.02.2022 18:30, Konstantin Belousov wrote:
> > > > > > As another blind guess, try to disable pcid, vm.pmap.pcid_enabled=0.
> > > > > 
> > > > > Do you mean it to be a workaround for TrueNAS 12, or it should 
> > > > > provide some information?  The system is at the office and has 
> > > > > no IPMI, so I can't switch the boot device from home right now.
> > > > I intended to see if it is the cause or related feature.
> > > 
> > > I'll try that on the 12 tomorrow, if applicable.
> > 
> > Yes should be relevant still.
> 
> It did the trick.  I repeated several times successful boots with the 
> pcid disabled, and failed ones with default enabled.  In attachment 
> you may find verbose serial console output captures with pcid disabled 
> and enabled, though without the cpuinfo patch.  During the testing I 
> had only one P and one E cores enabled to reduce noise.  Only after 
> that I found P core having SMT enabled, but I then repeated without 
> SMT also, so it is indeed irrelevant.
> 
> I'm curios, what in pcid could differentiate the P and E cores, and 
> have it got fixed in latest stable/13, or I am just "unlucky" to not 
> reproduce it there?

I am curious as well.  PCID works on both big Intel cores, and on small cores like Apollo Lake etc.  So the fact that it does not properly interact in P/E settings either mean that there is something I did not accounted for from the spec, or there is a bug in silicon.

I have no idea why do we work on stable/13 and HEAD.  There were enough changes to PCID code there, but it was mostly restructuring and polishing.

So the only way to get more understanding is to bisect to see which commit on HEAD fixed the boot.