svn commit: r322569 - in stable/11/sys/amd64: amd64 include linux

Don Lewis truckman at FreeBSD.org
Wed Aug 16 07:59:59 UTC 2017


Author: truckman
Date: Wed Aug 16 07:59:57 2017
New Revision: 322569
URL: https://svnweb.freebsd.org/changeset/base/322569

Log:
  MFC r321899
  
  Lower the amd64 shared page, which contains the signal trampoline,
  from the top of user memory to one page lower on machines with the
  Ryzen (AMD Family 17h) CPU.  This pushes ps_strings and the stack
  down by one page as well.  On Ryzen there is some sort of interaction
  between code running at the top of user memory address space and
  interrupts that can cause FreeBSD to either hang or silently reset.
  This sounds similar to the problem found with DragonFly BSD that
  was fixed with this commit:
    https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20
  but our signal trampoline location was already lower than the address
  that DragonFly moved their signal trampoline to.  It also does not
  appear to be related to SMT as described here:
    https://www.phoronix.com/forums/forum/hardware/processors-memory/955368-some-ryzen-linux-users-are-facing-issues-with-heavy-compilation-loads?p=955498#post955498
  
    "Hi, Matt Dillon here. Yes, I did find what I believe to be a
     hardware issue with Ryzen related to concurrent operations. In a
     nutshell, for any given hyperthread pair, if one hyperthread is
     in a cpu-bound loop of any kind (can be in user mode), and the
     other hyperthread is returning from an interrupt via IRETQ, the
     hyperthread issuing the IRETQ can stall indefinitely until the
     other hyperthread with the cpu-bound loop pauses (aka HLT until
     next interrupt). After this situation occurs, the system appears
     to destabilize. The situation does not occur if the cpu-bound
     loop is on a different core than the core doing the IRETQ. The
     %rip the IRETQ returns to (e.g. userland %rip address) matters a
     *LOT*. The problem occurs more often with high %rip addresses
     such as near the top of the user stack, which is where DragonFly's
     signal trampoline traditionally resides. So a user program taking
     a signal on one thread while another thread is cpu-bound can cause
     this behavior. Changing the location of the signal trampoline
     makes it more difficult to reproduce the problem. I have not
     been because the able to completely mitigate it. When a cpu-thread
     stalls in this manner it appears to stall INSIDE the microcode
     for IRETQ. It doesn't make it to the return pc, and the cpu thread
     cannot take any IPIs or other hardware interrupts while in this
     state."
  since the system instability has been observed on FreeBSD with SMT
  disabled.  Interrupts to appear to play a factor since running a
  signal-intensive process on the first CPU core, which handles most
  of the interrupts on my machine, is far more likely to trigger the
  problem than running such a process on any other core.
  
  Also lower sv_maxuser to prevent a malicious user from using mmap()
  to load and execute code in the top page of user memory that was made
  available when the shared page was moved down.
  
  Make the same changes to the 64-bit Linux emulator.
  
  PR:		219399
  Reported by:	nbe at renzel.net
  Reviewed by:	kib
  Reviewed by:	dchagin (previous version)
  Tested by:	nbe at renzel.net (earlier version)
  Differential Revision:	https://reviews.freebsd.org/D11780

Modified:
  stable/11/sys/amd64/amd64/elf_machdep.c
  stable/11/sys/amd64/amd64/initcpu.c
  stable/11/sys/amd64/include/md_var.h
  stable/11/sys/amd64/linux/linux_sysvec.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/amd64/amd64/elf_machdep.c
==============================================================================
--- stable/11/sys/amd64/amd64/elf_machdep.c	Wed Aug 16 06:43:50 2017	(r322568)
+++ stable/11/sys/amd64/amd64/elf_machdep.c	Wed Aug 16 07:59:57 2017	(r322569)
@@ -84,6 +84,25 @@ struct sysentvec elf64_freebsd_sysvec = {
 };
 INIT_SYSENTVEC(elf64_sysvec, &elf64_freebsd_sysvec);
 
+void
+amd64_lower_shared_page(struct sysentvec *sv)
+{
+	if (hw_lower_amd64_sharedpage != 0) {
+		sv->sv_maxuser -= PAGE_SIZE;
+		sv->sv_shared_page_base -= PAGE_SIZE;
+		sv->sv_usrstack -= PAGE_SIZE;
+		sv->sv_psstrings -= PAGE_SIZE;
+	}
+}
+
+/*
+ * Do this fixup before INIT_SYSENTVEC (SI_ORDER_ANY) because the latter
+ * uses the value of sv_shared_page_base.
+ */
+SYSINIT(elf64_sysvec_fixup, SI_SUB_EXEC, SI_ORDER_FIRST,
+	(sysinit_cfunc_t) amd64_lower_shared_page,
+	&elf64_freebsd_sysvec);
+
 static Elf64_Brandinfo freebsd_brand_info = {
 	.brand		= ELFOSABI_FREEBSD,
 	.machine	= EM_X86_64,

Modified: stable/11/sys/amd64/amd64/initcpu.c
==============================================================================
--- stable/11/sys/amd64/amd64/initcpu.c	Wed Aug 16 06:43:50 2017	(r322568)
+++ stable/11/sys/amd64/amd64/initcpu.c	Wed Aug 16 07:59:57 2017	(r322569)
@@ -48,6 +48,11 @@ __FBSDID("$FreeBSD$");
 static int	hw_instruction_sse;
 SYSCTL_INT(_hw, OID_AUTO, instruction_sse, CTLFLAG_RD,
     &hw_instruction_sse, 0, "SIMD/MMX2 instructions available in CPU");
+static int	lower_sharedpage_init;
+int		hw_lower_amd64_sharedpage;
+SYSCTL_INT(_hw, OID_AUTO, lower_amd64_sharedpage, CTLFLAG_RDTUN,
+    &hw_lower_amd64_sharedpage, 0,
+   "Lower sharedpage to work around Ryzen issue with executing code near the top of user memory");
 /*
  * -1: automatic (default)
  *  0: keep enable CLFLUSH
@@ -120,6 +125,28 @@ init_amd(void)
 			msr = rdmsr(0xc0011020);
 			msr |= (uint64_t)1 << 15;
 			wrmsr(0xc0011020, msr);
+		}
+	}
+
+	/*
+	 * Work around a problem on Ryzen that is triggered by executing
+	 * code near the top of user memory, in our case the signal
+	 * trampoline code in the shared page on amd64.
+	 *
+	 * This function is executed once for the BSP before tunables take
+	 * effect so the value determined here can be overridden by the
+	 * tunable.  This function is then executed again for each AP and
+	 * also on resume.  Set a flag the first time so that value set by
+	 * the tunable is not overwritten.
+	 *
+	 * The stepping and/or microcode versions should be checked after
+	 * this issue is fixed by AMD so that we don't use this mode if not
+	 * needed.
+	 */
+	if (lower_sharedpage_init == 0) {
+		lower_sharedpage_init = 1;
+		if (CPUID_TO_FAMILY(cpu_id) == 0x17) {
+			hw_lower_amd64_sharedpage = 1;
 		}
 	}
 }

Modified: stable/11/sys/amd64/include/md_var.h
==============================================================================
--- stable/11/sys/amd64/include/md_var.h	Wed Aug 16 06:43:50 2017	(r322568)
+++ stable/11/sys/amd64/include/md_var.h	Wed Aug 16 07:59:57 2017	(r322569)
@@ -34,11 +34,14 @@
 
 #include <x86/x86_var.h>
 
-extern  uint64_t *vm_page_dump;
+extern uint64_t	*vm_page_dump;
+extern int	hw_lower_amd64_sharedpage;
 
 struct	savefpu;
+struct	sysentvec;
 
 void	amd64_db_resume_dbreg(void);
+void	amd64_lower_shared_page(struct sysentvec *);
 void	amd64_syscall(struct thread *td, int traced);
 void	doreti_iret(void) __asm(__STRING(doreti_iret));
 void	doreti_iret_fault(void) __asm(__STRING(doreti_iret_fault));

Modified: stable/11/sys/amd64/linux/linux_sysvec.c
==============================================================================
--- stable/11/sys/amd64/linux/linux_sysvec.c	Wed Aug 16 06:43:50 2017	(r322568)
+++ stable/11/sys/amd64/linux/linux_sysvec.c	Wed Aug 16 07:59:57 2017	(r322569)
@@ -833,6 +833,8 @@ static void
 linux_vdso_install(void *param)
 {
 
+	amd64_lower_shared_page(&elf_linux_sysvec);
+
 	linux_szsigcode = (&_binary_linux_locore_o_end - 
 	    &_binary_linux_locore_o_start);
 


More information about the svn-src-all mailing list