svn commit: r355145 - head/sys/arm64/arm64

Alan Cox alc at rice.edu
Fri Nov 29 18:25:49 UTC 2019


On 11/28/19 7:52 AM, Konstantin Belousov wrote:
> On Thu, Nov 28, 2019 at 09:17:15AM +0000, Andrew Turner wrote:
>>
>>> On 28 Nov 2019, at 08:48, Michal Meloun <meloun.michal at gmail.com> wrote:
>>>
>>>
>>>
>>> On 27.11.2019 21:33, Alan Cox wrote:
>>>> Author: alc
>>>> Date: Wed Nov 27 20:33:49 2019
>>>> New Revision: 355145
>>>> URL: https://svnweb.freebsd.org/changeset/base/355145
>>>>
>>>> Log:
>>>>   There is no reason why we need to pin the underlying thread to its current
>>>>   processor in pmap_invalidate_{all,page,range}().  These functions are using
>>>>   an instruction that broadcasts the TLB invalidation to every processor, so
>>>>   even if a thread migrates in the middle of one of these functions every
>>>>   processor will still perform the required TLB invalidations.
>>> I think this is not the right assumption. The problem is not in TLB
>>> operations themselves, but in following 'dsb' and / or 'isb'. 'dsb'
>>> ensures that all TLB operation transmitted by the local CPU is performed
>>> and visible to other observers. But it does nothing with TLBs emitted by
>>> other CPUs.
>>> For example, if a given thread is rescheduled after all TLB operations
>>> but before 'dsb' or 'isb' is performed, then the requested
>>> synchronization does not occur at all.
>> The tibi instructions need a context synchronisation point. One option is the dsb & isb instructions, another is an exception entry.
>>
>> For a thread to be rescheduled it requires the timer interrupt to fire. As an exception entry is a context synchronisation point and an interrupt will cause an exception entry there will be such a point after the the tibi instruction.
>>
> D5.10.2. TLB maintenance instructions, 'Ordering and completion of
> TLB maintenance instructions' states that DSB on the PE that issued
> TLBI is required. It does not state that arbitrary even causing
> SynchronizeContext() is enough.
>
> Also I was not able to find any explanation of SynchronizeContext().


The entry for "Context Synchronization Events" in the Glossary on page 
7460 provides the best explanation that I've found.  The first three 
listed events are an isb instruction, taking an exception, and returning 
from an exception.  However, there is nothing here to suggest that 
taking or returning from an exception can take the place of a dsb 
instruction.

(On a related note, I'll observe that Linux does not perform an isb 
instruction during TLB invalidation unless it is changing a leaf in the 
kernel page table.  For changes to user-space page tables, they appear 
to be assuming that the return to user-space will suffice to resync the 
user-space instruction stream.)


> Curiously, on IA32 exceptions are not specified to issue a serialization
> point, although rumors say that on all produced microarchitectures they are.


This issue has similarities to the one that we discussed in 
https://reviews.freebsd.org/D22007.  For example, on a context switch we 
will perform a dsb instruction in pmap_activate_int() unless we are 
switching to a thread within the same address space.  Moreover, that dsb 
instruction will be performed before the lock on the old thread is 
released.  So, in this case, we are guaranteed that any previously 
issued tlbi instructions are completed before the thread can be 
restarted on another processor.

However, if we are simply switching between threads within the same 
address space, we are not performing a dsb instruction.  And, my error 
was believing that the memory barriers inherent to the locking 
operations on the thread during context switches would be sufficient to 
ensure that all of the tlbi instructions on the initial processor would 
have completed before the dsb instruction on the eventual processor 
finishing the pmap_invalidate_*() completed.

After further reading, I'm afraid that we have a similar issue with 
cache management functions, like cpu_icache_sync_range(). Quoting 
K11.5.2, "The ARMv8 architecture requires a PE that executes an 
instruction cache maintenance instruction to execute a DSB instruction 
to ensure completion of the maintenance operation."  So, we have a 
similar problem if we are preempted during the cpu_icache_sync_range() 
calls from pmap_enter(), pmap_enter_quick(), and pmap_sync_icache().  
Unless we are switching to a different address space, we are not 
guaranteed to perform a dsb instruction on the initial processor.

Moreover, we are configuring the processor in locore.S so that 
user-space can directly perform "ic ivau" instructions.  (See SCTLR_UCI 
in sctlr_set.)  So, I'm inclined to say that we should handle this issue 
analogously to r354630 on amd64:

Index: arm64/arm64/pmap.c
===================================================================
--- arm64/arm64/pmap.c  (revision 355145)
+++ arm64/arm64/pmap.c  (working copy)
@@ -5853,8 +5853,11 @@ pmap_activate_int(pmap_t pmap)

         KASSERT(PCPU_GET(curpmap) != NULL, ("no active pmap"));
         KASSERT(pmap != kernel_pmap, ("kernel pmap activation"));
-       if (pmap == PCPU_GET(curpmap))
+       if (pmap == PCPU_GET(curpmap)) {
+               /* XXXExplain why. */
+               dsb(ish);
                 return (false);
+       }

         /*
          * Ensure that the store to curpmap is globally visible before the



More information about the svn-src-head mailing list