How to prevent other CPU from accessing a set of pages before calling pmap_remove_all functi

Thu Sep 10 17:21:02 UTC 2009

On Sep 10, 2009 5:08am, Kostik Belousov <kostikbel at gmail.com> wrote:
> On Wed, Sep 09, 2009 at 11:57:24PM -0700, MingyanGuo wrote:

> > On Wed, Sep 9, 2009 at 11:26 PM, MingyanGuo guomingyan at gmail.com> wrote:

> >

> > > Hi all,

> > >

> > > I find that function pmap_remove_all for arch amd64 works with a time

> > > window between reading & clearing the PTE flags(access flag and dirty  
> flag)

> > > and invalidating its TLB entry on other CPU. After some discussion  
> with Li

> > > Xin(cced), I think all the processes that are using the PTE being  
> removed

> > > should be blocked before calling pmap_remove_all, or other CPU may  
> dirty the

> > > page but does not set the dirty flag before the TLB entry is flushed.  
> But I

> > > can not find how to block them to call the function. I read the  
> function

> > > vm_pageout_scan in file vm/vm_pageout.c but can not find the exact  
> method it

> > > used. Or I just misunderstood the semantics of function  
> pmap_remove_all ?

> > >

> > > Thanks in advance.

> > >

> > > Regards,

> > > MingyanGuo

> > >

> >

> > Sorry for the noise. I understand the logic now. There is no time window

> > problem between reading & clearing the PTE and invalidating it on other  
> CPU,

> > even if other CPU is using the PTE. I misunderstood the logic.

> Hmm. What would happen for the following scenario.

> Assume that the page m is mapped by vm map active on CPU1, and that

> CPU1 has cached TLB entry for some writable mapping of this page,

> but neither TLB entry not PTE has dirty bit set.

> Then, assume that the following sequence of events occur:

> CPU1: CPU2:

> call pmap_remove_all(m)

> clear pte

> write to the address mapped

> by m [*]

> invalidate the TLB,

> possibly making IPI to CPU1

> I assume that at the point marked [*], we can

> - either loose the dirty bit, while CPU1 (atomically) sets the dirty bit

> in the cleared pte.

> Besides not properly tracking the modification status of the page,

> it could also cause the page table page to be modified, that would

> create non-zero page with PG_ZERO flag set.

> - or CPU1 re-reads the PTE entry when setting the dirty bit, and generates

> #pf since valid bit in PTE is zero.

> Intel documentation mentions that dirty or accessed bits updates are done

> with locked cycle, that definitely means that PTE is re-read, but I cannot

> find whether valid bit is rechecked.

I am not an architecture expert, but from a programmer's view,
I *think* using the 'in memory' PTE structure for the first write to
that PTE is more reasonable. To set the dirty bit, a CPU has to access  
memory
with locked cycles, so using the 'in memory' PTE structure should add few
performance burden but more friendly to software. However, it is just my
guess, I am reading the manuals to find if any description about it.

Regards,
MingyanGuo