svn commit: r253877 - in projects/atomic64/sys: amd64/include i386/include

Fri Aug 2 15:42:19 UTC 2013

On Aug 2, 2013, at 6:51 AM, Bruce Evans wrote:

> On Fri, 2 Aug 2013, Jung-uk Kim wrote:
> 
>> Log:
>> Reimplement atomic operations on PDEs and PTEs in pmap.h.  This change
>> significantly reduces duplicate code.  Also, it may improve and even correct
>> some questionable implementations.
> 
> Do they all (or any) need to be atomic with respect to multiple CPUs?
> It's hard to see how concurrent accesses to page tables can work worh
> without higher-level locking than is provided by atomic ops.
> 

Some do, so that we do not lose a PG_M ("dirty") bit being set concurrently by another processor.  However. none of these accesses need to be labeled as acquires or releases.

>> Modified: projects/atomic64/sys/amd64/include/pmap.h
>> ==============================================================================
>> --- projects/atomic64/sys/amd64/include/pmap.h	Fri Aug  2 00:08:00 2013	(r253876)
>> +++ projects/atomic64/sys/amd64/include/pmap.h	Fri Aug  2 00:20:04 2013	(r253877)
>> @@ -185,41 +185,13 @@ extern u_int64_t KPML4phys;	/* physical
>> pt_entry_t *vtopte(vm_offset_t);
>> #define	vtophys(va)	pmap_kextract(((vm_offset_t) (va)))
>> 
>> -static __inline pt_entry_t
>> -pte_load(pt_entry_t *ptep)
>> -{
>> -	pt_entry_t r;
>> -
>> -	r = *ptep;
>> -	return (r);
>> -}
> 
> This function wasn't atomic with respect to multiple CPUs.  Except on
> i386 with PAE, but then it changes a 64-bit object on a 32-bit CPU,
> so it needs some locking just to be atomic with respect to a single CPU.
> 
>> -static __inline pt_entry_t
>> -pte_load_store(pt_entry_t *ptep, pt_entry_t pte)
>> -{
>> -	pt_entry_t r;
>> -
>> -	__asm __volatile(
>> -	    "xchgq %0,%1"
>> -	    : "=m" (*ptep),
>> -	      "=r" (r)
>> -	    : "1" (pte),
>> -	      "m" (*ptep));
>> -	return (r);
>> -}
> 
> This was the main one that was atomic with respect to multiple CPUs on
> both amd64 and i386.  This seems to be accidental -- xchg to memory gives
> a lock prefix and slowness whether you want it or not.
> 
>> -
>> -#define	pte_load_clear(pte)	atomic_readandclear_long(pte)
>> -
>> -static __inline void
>> -pte_store(pt_entry_t *ptep, pt_entry_t pte)
>> -{
>> +#define	pte_load(ptep)			atomic_load_acq_long(ptep)
>> +#define	pte_load_store(ptep, pte)	atomic_swap_long(ptep, pte)
>> +#define	pte_load_clear(pte)		atomic_swap_long(pte, 0)
>> +#define	pte_store(ptep, pte)		atomic_store_rel_long(ptep, pte)
>> +#define	pte_clear(ptep)			atomic_store_rel_long(ptep, 0)
>> 
>> -	*ptep = pte;
>> -}
> 
> pte_store() was also not atomic with respect to multiple CPUs.  So almost
> everything was not atomic with respect to multiple CPUs, except for PAE
> on i386.
> 
> Bruce
>