svn commit: r495458 - in head/emulators/xen-kernel: . files

Tue Mar 12 15:02:38 UTC 2019

Author: royger (src committer)
Date: Tue Mar 12 15:02:35 2019
New Revision: 495458
URL: https://svnweb.freebsd.org/changeset/ports/495458

Log:
  emulators/xen-kernel: backport fixes and apply XSAs
  
  Backport a couple of fixes critical for PVH dom0 and
  fixes for XSA-{284,287,290,292-294}.
  
  Sponsored-by:		Citrix Systems R&D
  Reviewed by:		bapt
  Differential revision:	https://reviews.freebsd.org/D19413

Added:
  head/emulators/xen-kernel/files/0001-pvh-dom0-fix-deadlock-in-GSI-mapping.patch   (contents, props changed)
  head/emulators/xen-kernel/files/0001-x86-dom0-propagate-PVH-vlapic-EOIs-to-hardware.patch   (contents, props changed)
  head/emulators/xen-kernel/files/0001-x86-mm-locks-remove-trailing-whitespace.patch   (contents, props changed)
  head/emulators/xen-kernel/files/0002-x86-mm-locks-convert-some-macros-to-inline-functions.patch   (contents, props changed)
  head/emulators/xen-kernel/files/0003-x86-mm-locks-apply-a-bias-to-lock-levels-for-control.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa284.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa287-4.11.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa290-4.11-1.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa290-4.11-2.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa292.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa293-4.11-1.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa293-4.11-2.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa294-4.11.patch   (contents, props changed)
Modified:
  head/emulators/xen-kernel/Makefile

Modified: head/emulators/xen-kernel/Makefile
==============================================================================

--- head/emulators/xen-kernel/Makefile	Tue Mar 12 14:35:24 2019	(r495457)
+++ head/emulators/xen-kernel/Makefile	Tue Mar 12 15:02:35 2019	(r495458)
@@ -2,7 +2,7 @@
 
 PORTNAME=	xen
 PORTVERSION=	4.11.1
-PORTREVISION=	0
+PORTREVISION=	1
 CATEGORIES=	emulators
 MASTER_SITES=	http://downloads.xenproject.org/release/xen/${PORTVERSION}/
 PKGNAMESUFFIX=	-kernel
@@ -45,6 +45,29 @@ EXTRA_PATCHES+=	${FILESDIR}/0001-x86-mtrr-introduce-ma
 EXTRA_PATCHES+=	${FILESDIR}/0001-x86-replace-usage-in-the-linker-script.patch:-p1
 # Fix PVH Dom0 build with shadow paging
 EXTRA_PATCHES+= ${FILESDIR}/0001-x86-pvh-change-the-order-of-the-iommu-initialization.patch:-p1
+# Forward dom0 lapic EOIs to underlying hardware
+EXTRA_PATCHES+=	${FILESDIR}/0001-x86-dom0-propagate-PVH-vlapic-EOIs-to-hardware.patch:-p1
+# Fix deadlock in IO-APIC gsi mapping
+EXTRA_PATCHES+=	${FILESDIR}/0001-pvh-dom0-fix-deadlock-in-GSI-mapping.patch:-p1
+# Fix for migration/save
+EXTRA_PATCHES+=	${FILESDIR}/0001-x86-mm-locks-remove-trailing-whitespace.patch:-p1 \
+		${FILESDIR}/0002-x86-mm-locks-convert-some-macros-to-inline-functions.patch:-p1 \
+		${FILESDIR}/0003-x86-mm-locks-apply-a-bias-to-lock-levels-for-control.patch:-p1
+
+# XSA-284
+EXTRA_PATCHES+=	${FILESDIR}/xsa284.patch:-p1
+# XSA-287
+EXTRA_PATCHES+=	${FILESDIR}/xsa287-4.11.patch:-p1
+# XSA-290
+EXTRA_PATCHES+=	${FILESDIR}/xsa290-4.11-1.patch:-p1 \
+		${FILESDIR}/xsa290-4.11-2.patch:-p1
+# XSA-292
+EXTRA_PATCHES+=	${FILESDIR}/xsa292.patch:-p1
+# XSA-293
+EXTRA_PATCHES+=	${FILESDIR}/xsa293-4.11-1.patch:-p1 \
+		${FILESDIR}/xsa293-4.11-2.patch:-p1
+# XSA-294
+EXTRA_PATCHES+=	${FILESDIR}/xsa294-4.11.patch:-p1
 
 .include <bsd.port.options.mk>
 

Added: head/emulators/xen-kernel/files/0001-pvh-dom0-fix-deadlock-in-GSI-mapping.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0001-pvh-dom0-fix-deadlock-in-GSI-mapping.patch	Tue Mar 12 15:02:35 2019	(r495458)
@@ -0,0 +1,115 @@
+From 603ad88fe8a681a2c5408c3f432d7083dd1c41b1 Mon Sep 17 00:00:00 2001
+From: Roger Pau Monne <roger.pau at citrix.com>
+Date: Mon, 28 Jan 2019 15:22:45 +0100
+Subject: [PATCH] pvh/dom0: fix deadlock in GSI mapping
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The current GSI mapping code can cause the following deadlock:
+
+(XEN) *** Dumping CPU0 host state: ***
+(XEN) ----[ Xen-4.12.0-rc  x86_64  debug=y   Tainted:  C   ]----
+[...]
+(XEN) Xen call trace:
+(XEN)    [<ffff82d080239852>] vmac.c#_spin_lock_cb+0x32/0x70
+(XEN)    [<ffff82d0802ed40f>] vmac.c#hvm_gsi_assert+0x2f/0x60 <- pick hvm.irq_lock
+(XEN)    [<ffff82d080255cc9>] io.c#hvm_dirq_assist+0xd9/0x130 <- pick event_lock
+(XEN)    [<ffff82d080255b4b>] io.c#dpci_softirq+0xdb/0x120
+(XEN)    [<ffff82d080238ce6>] softirq.c#__do_softirq+0x46/0xa0
+(XEN)    [<ffff82d08026f955>] domain.c#idle_loop+0x35/0x90
+(XEN)
+[...]
+(XEN) *** Dumping CPU3 host state: ***
+(XEN) ----[ Xen-4.12.0-rc  x86_64  debug=y   Tainted:  C   ]----
+[...]
+(XEN) Xen call trace:
+(XEN)    [<ffff82d08023985d>] vmac.c#_spin_lock_cb+0x3d/0x70
+(XEN)    [<ffff82d080281fc8>] vmac.c#allocate_and_map_gsi_pirq+0xc8/0x130 <- pick event_lock
+(XEN)    [<ffff82d0802f44c0>] vioapic.c#vioapic_hwdom_map_gsi+0x80/0x130
+(XEN)    [<ffff82d0802f4399>] vioapic.c#vioapic_write_redirent+0x119/0x1c0 <- pick hvm.irq_lock
+(XEN)    [<ffff82d0802f4075>] vioapic.c#vioapic_write+0x35/0x40
+(XEN)    [<ffff82d0802e96a2>] vmac.c#hvm_process_io_intercept+0xd2/0x230
+(XEN)    [<ffff82d0802e9842>] vmac.c#hvm_io_intercept+0x22/0x50
+(XEN)    [<ffff82d0802dbe9b>] emulate.c#hvmemul_do_io+0x21b/0x3c0
+(XEN)    [<ffff82d0802db302>] emulate.c#hvmemul_do_io_buffer+0x32/0x70
+(XEN)    [<ffff82d0802dcd29>] emulate.c#hvmemul_do_mmio_buffer+0x29/0x30
+(XEN)    [<ffff82d0802dcc19>] emulate.c#hvmemul_phys_mmio_access+0xf9/0x1b0
+(XEN)    [<ffff82d0802dc6d0>] emulate.c#hvmemul_linear_mmio_access+0xf0/0x180
+(XEN)    [<ffff82d0802de971>] emulate.c#hvmemul_linear_mmio_write+0x21/0x30
+(XEN)    [<ffff82d0802de742>] emulate.c#linear_write+0xa2/0x100
+(XEN)    [<ffff82d0802dce15>] emulate.c#hvmemul_write+0xb5/0x120
+(XEN)    [<ffff82d0802babba>] vmac.c#x86_emulate+0x132aa/0x149a0
+(XEN)    [<ffff82d0802c04f9>] vmac.c#x86_emulate_wrapper+0x29/0x70
+(XEN)    [<ffff82d0802db570>] emulate.c#_hvm_emulate_one+0x50/0x140
+(XEN)    [<ffff82d0802e9e31>] vmac.c#hvm_emulate_one_insn+0x41/0x100
+(XEN)    [<ffff82d080345066>] guest_4.o#sh_page_fault__guest_4+0x976/0xd30
+(XEN)    [<ffff82d08030cc69>] vmac.c#vmx_vmexit_handler+0x949/0xea0
+(XEN)    [<ffff82d08031411a>] vmac.c#vmx_asm_vmexit_handler+0xfa/0x270
+
+In order to solve it move the vioapic_hwdom_map_gsi outside of the
+locked region in vioapic_write_redirent. vioapic_hwdom_map_gsi will
+not access any of the vioapic fields, so there's no need to call the
+function holding the hvm.irq_lock.
+
+Signed-off-by: Roger Pau Monné <roger.pau at citrix.com>
+Reviewed-by: Wei Liu <wei.liu2 at citrix.com>
+Reviewed-by: Jan Beulich <jbeulich at suse.com>
+Release-acked-by: Juergen Gross <jgross at suse.com>
+---
+ xen/arch/x86/hvm/vioapic.c | 32 ++++++++++++++++++--------------
+ 1 file changed, 18 insertions(+), 14 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
+index 2b74f92d51..2d71c33c1c 100644
+--- a/xen/arch/x86/hvm/vioapic.c
++++ b/xen/arch/x86/hvm/vioapic.c
+@@ -236,20 +236,6 @@ static void vioapic_write_redirent(
+ 
+     *pent = ent;
+ 
+-    if ( is_hardware_domain(d) && unmasked )
+-    {
+-        int ret;
+-
+-        ret = vioapic_hwdom_map_gsi(gsi, ent.fields.trig_mode,
+-                                    ent.fields.polarity);
+-        if ( ret )
+-        {
+-            /* Mask the entry again. */
+-            pent->fields.mask = 1;
+-            unmasked = 0;
+-        }
+-    }
+-
+     if ( gsi == 0 )
+     {
+         vlapic_adjust_i8259_target(d);
+@@ -266,6 +252,24 @@ static void vioapic_write_redirent(
+ 
+     spin_unlock(&d->arch.hvm.irq_lock);
+ 
++    if ( is_hardware_domain(d) && unmasked )
++    {
++        /*
++         * NB: don't call vioapic_hwdom_map_gsi while holding hvm.irq_lock
++         * since it can cause deadlocks as event_lock is taken by
++         * allocate_and_map_gsi_pirq, and that will invert the locking order
++         * used by other parts of the code.
++         */
++        int ret = vioapic_hwdom_map_gsi(gsi, ent.fields.trig_mode,
++                                        ent.fields.polarity);
++        if ( ret )
++        {
++            gprintk(XENLOG_ERR,
++                    "unable to bind gsi %u to hardware domain: %d\n", gsi, ret);
++            unmasked = 0;
++        }
++    }
++
+     if ( gsi == 0 || unmasked )
+         pt_may_unmask_irq(d, NULL);
+ }
+-- 
+2.17.2 (Apple Git-113)
+

Added: head/emulators/xen-kernel/files/0001-x86-dom0-propagate-PVH-vlapic-EOIs-to-hardware.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0001-x86-dom0-propagate-PVH-vlapic-EOIs-to-hardware.patch	Tue Mar 12 15:02:35 2019	(r495458)
@@ -0,0 +1,39 @@
+From 19d2bce1c3cbfdc636c142cdf0ae38795f2202dd Mon Sep 17 00:00:00 2001
+From: Roger Pau Monne <roger.pau at citrix.com>
+Date: Thu, 14 Feb 2019 14:41:03 +0100
+Subject: [PATCH for-4.12] x86/dom0: propagate PVH vlapic EOIs to hardware
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Current check for MSI EIO is missing a special case for PVH Dom0,
+which doesn't have a hvm_irq_dpci struct but requires EIOs to be
+forwarded to the physical lapic for passed-through devices.
+
+Add a short-circuit to allow EOIs from PVH Dom0 to be propagated.
+
+Signed-off-by: Roger Pau Monné <roger.pau at citrix.com>
+---
+Cc: Jan Beulich <jbeulich at suse.com>
+Cc: Juergen Gross <jgross at suse.com>
+---
+ xen/drivers/passthrough/io.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
+index a6eb8a4336..4290c7c710 100644
+--- a/xen/drivers/passthrough/io.c
++++ b/xen/drivers/passthrough/io.c
+@@ -869,7 +869,8 @@ static int _hvm_dpci_msi_eoi(struct domain *d,
+ 
+ void hvm_dpci_msi_eoi(struct domain *d, int vector)
+ {
+-    if ( !iommu_enabled || !hvm_domain_irq(d)->dpci )
++    if ( !iommu_enabled ||
++         (!hvm_domain_irq(d)->dpci && !is_hardware_domain(d)) )
+        return;
+ 
+     spin_lock(&d->event_lock);
+-- 
+2.17.2 (Apple Git-113)
+

Added: head/emulators/xen-kernel/files/0001-x86-mm-locks-remove-trailing-whitespace.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0001-x86-mm-locks-remove-trailing-whitespace.patch	Tue Mar 12 15:02:35 2019	(r495458)
@@ -0,0 +1,101 @@
+From 468937da985661e5cd1d6b2df6d6ab2d1fb1e5e4 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau at citrix.com>
+Date: Tue, 12 Mar 2019 12:21:03 +0100
+Subject: [PATCH 1/3] x86/mm-locks: remove trailing whitespace
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+No functional change.
+
+Signed-off-by: Roger Pau Monné <roger.pau at citrix.com>
+Reviewed-by: George Dunlap <george.dunlap at citrix.com>
+---
+ xen/arch/x86/mm/mm-locks.h | 24 ++++++++++++------------
+ 1 file changed, 12 insertions(+), 12 deletions(-)
+
+diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
+index e5fceb2d2e..6c15b9a4cc 100644
+--- a/xen/arch/x86/mm/mm-locks.h
++++ b/xen/arch/x86/mm/mm-locks.h
+@@ -3,11 +3,11 @@
+  *
+  * Spinlocks used by the code in arch/x86/mm.
+  *
+- * Copyright (c) 2011 Citrix Systems, inc. 
++ * Copyright (c) 2011 Citrix Systems, inc.
+  * Copyright (c) 2007 Advanced Micro Devices (Wei Huang)
+  * Copyright (c) 2006-2007 XenSource Inc.
+  * Copyright (c) 2006 Michael A Fetterman
+- * 
++ *
+  * This program is free software; you can redistribute it and/or modify
+  * it under the terms of the GNU General Public License as published by
+  * the Free Software Foundation; either version 2 of the License, or
+@@ -41,7 +41,7 @@ static inline void mm_lock_init(mm_lock_t *l)
+     l->unlock_level = 0;
+ }
+ 
+-static inline int mm_locked_by_me(mm_lock_t *l) 
++static inline int mm_locked_by_me(mm_lock_t *l)
+ {
+     return (l->lock.recurse_cpu == current->processor);
+ }
+@@ -67,7 +67,7 @@ do {                                \
+ 
+ static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec)
+ {
+-    if ( !((mm_locked_by_me(l)) && rec) ) 
++    if ( !((mm_locked_by_me(l)) && rec) )
+         __check_lock_level(level);
+     spin_lock_recursive(&l->lock);
+     if ( l->lock.recurse_cnt == 1 )
+@@ -186,7 +186,7 @@ static inline void mm_unlock(mm_lock_t *l)
+     spin_unlock_recursive(&l->lock);
+ }
+ 
+-static inline void mm_enforce_order_unlock(int unlock_level, 
++static inline void mm_enforce_order_unlock(int unlock_level,
+                                             unsigned short *recurse_count)
+ {
+     if ( recurse_count )
+@@ -310,7 +310,7 @@ declare_mm_rwlock(altp2m);
+ #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
+ 
+ /* PoD lock (per-p2m-table)
+- * 
++ *
+  * Protects private PoD data structs: entry and cache
+  * counts, page lists, sweep parameters. */
+ 
+@@ -322,7 +322,7 @@ declare_mm_lock(pod)
+ 
+ /* Page alloc lock (per-domain)
+  *
+- * This is an external lock, not represented by an mm_lock_t. However, 
++ * This is an external lock, not represented by an mm_lock_t. However,
+  * pod code uses it in conjunction with the p2m lock, and expecting
+  * the ordering which we enforce here.
+  * The lock is not recursive. */
+@@ -338,13 +338,13 @@ declare_mm_order_constraint(page_alloc)
+  * For shadow pagetables, this lock protects
+  *   - all changes to shadow page table pages
+  *   - the shadow hash table
+- *   - the shadow page allocator 
++ *   - the shadow page allocator
+  *   - all changes to guest page table pages
+  *   - all changes to the page_info->tlbflush_timestamp
+- *   - the page_info->count fields on shadow pages 
+- * 
+- * For HAP, it protects the NPT/EPT tables and mode changes. 
+- * 
++ *   - the page_info->count fields on shadow pages
++ *
++ * For HAP, it protects the NPT/EPT tables and mode changes.
++ *
+  * It also protects the log-dirty bitmap from concurrent accesses (and
+  * teardowns, etc). */
+ 
+-- 
+2.17.2 (Apple Git-113)
+

Added: head/emulators/xen-kernel/files/0002-x86-mm-locks-convert-some-macros-to-inline-functions.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0002-x86-mm-locks-convert-some-macros-to-inline-functions.patch	Tue Mar 12 15:02:35 2019	(r495458)
@@ -0,0 +1,210 @@
+From 45e260afe7ee0e6b18a7e64173a081eec6e056aa Mon Sep 17 00:00:00 2001
+From: Roger Pau Monne <roger.pau at citrix.com>
+Date: Tue, 12 Mar 2019 12:24:37 +0100
+Subject: [PATCH 2/3] x86/mm-locks: convert some macros to inline functions
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+And rename to have only one prefix underscore where applicable.
+
+No functional change.
+
+Signed-off-by: Roger Pau Monné <roger.pau at citrix.com>
+Reviewed-by: George Dunlap <george.dunlap at citrix.com>
+---
+ xen/arch/x86/mm/mm-locks.h | 98 ++++++++++++++++++++------------------
+ 1 file changed, 52 insertions(+), 46 deletions(-)
+
+diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
+index 6c15b9a4cc..d3497713e9 100644
+--- a/xen/arch/x86/mm/mm-locks.h
++++ b/xen/arch/x86/mm/mm-locks.h
+@@ -29,7 +29,6 @@
+ 
+ /* Per-CPU variable for enforcing the lock ordering */
+ DECLARE_PER_CPU(int, mm_lock_level);
+-#define __get_lock_level()  (this_cpu(mm_lock_level))
+ 
+ DECLARE_PERCPU_RWLOCK_GLOBAL(p2m_percpu_rwlock);
+ 
+@@ -46,43 +45,47 @@ static inline int mm_locked_by_me(mm_lock_t *l)
+     return (l->lock.recurse_cpu == current->processor);
+ }
+ 
++static inline int _get_lock_level(void)
++{
++    return this_cpu(mm_lock_level);
++}
++
+ /*
+  * If you see this crash, the numbers printed are order levels defined
+  * in this file.
+  */
+-#define __check_lock_level(l)                           \
+-do {                                                    \
+-    if ( unlikely(__get_lock_level() > (l)) )           \
+-    {                                                   \
+-        printk("mm locking order violation: %i > %i\n", \
+-               __get_lock_level(), (l));                \
+-        BUG();                                          \
+-    }                                                   \
+-} while(0)
+-
+-#define __set_lock_level(l)         \
+-do {                                \
+-    __get_lock_level() = (l);       \
+-} while(0)
++static inline void _check_lock_level(int l)
++{
++    if ( unlikely(_get_lock_level() > l) )
++    {
++        printk("mm locking order violation: %i > %i\n", _get_lock_level(), l);
++        BUG();
++    }
++}
++
++static inline void _set_lock_level(int l)
++{
++    this_cpu(mm_lock_level) = l;
++}
+ 
+ static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec)
+ {
+     if ( !((mm_locked_by_me(l)) && rec) )
+-        __check_lock_level(level);
++        _check_lock_level(level);
+     spin_lock_recursive(&l->lock);
+     if ( l->lock.recurse_cnt == 1 )
+     {
+         l->locker_function = func;
+-        l->unlock_level = __get_lock_level();
++        l->unlock_level = _get_lock_level();
+     }
+     else if ( (unlikely(!rec)) )
+-        panic("mm lock already held by %s", l->locker_function);
+-    __set_lock_level(level);
++        panic("mm lock already held by %s\n", l->locker_function);
++    _set_lock_level(level);
+ }
+ 
+ static inline void _mm_enforce_order_lock_pre(int level)
+ {
+-    __check_lock_level(level);
++    _check_lock_level(level);
+ }
+ 
+ static inline void _mm_enforce_order_lock_post(int level, int *unlock_level,
+@@ -92,12 +95,12 @@ static inline void _mm_enforce_order_lock_post(int level, int *unlock_level,
+     {
+         if ( (*recurse_count)++ == 0 )
+         {
+-            *unlock_level = __get_lock_level();
++            *unlock_level = _get_lock_level();
+         }
+     } else {
+-        *unlock_level = __get_lock_level();
++        *unlock_level = _get_lock_level();
+     }
+-    __set_lock_level(level);
++    _set_lock_level(level);
+ }
+ 
+ 
+@@ -118,12 +121,12 @@ static inline void _mm_write_lock(mm_rwlock_t *l, const char *func, int level)
+ {
+     if ( !mm_write_locked_by_me(l) )
+     {
+-        __check_lock_level(level);
++        _check_lock_level(level);
+         percpu_write_lock(p2m_percpu_rwlock, &l->lock);
+         l->locker = get_processor_id();
+         l->locker_function = func;
+-        l->unlock_level = __get_lock_level();
+-        __set_lock_level(level);
++        l->unlock_level = _get_lock_level();
++        _set_lock_level(level);
+     }
+     l->recurse_count++;
+ }
+@@ -134,13 +137,13 @@ static inline void mm_write_unlock(mm_rwlock_t *l)
+         return;
+     l->locker = -1;
+     l->locker_function = "nobody";
+-    __set_lock_level(l->unlock_level);
++    _set_lock_level(l->unlock_level);
+     percpu_write_unlock(p2m_percpu_rwlock, &l->lock);
+ }
+ 
+ static inline void _mm_read_lock(mm_rwlock_t *l, int level)
+ {
+-    __check_lock_level(level);
++    _check_lock_level(level);
+     percpu_read_lock(p2m_percpu_rwlock, &l->lock);
+     /* There's nowhere to store the per-CPU unlock level so we can't
+      * set the lock level. */
+@@ -181,7 +184,7 @@ static inline void mm_unlock(mm_lock_t *l)
+     if ( l->lock.recurse_cnt == 1 )
+     {
+         l->locker_function = "nobody";
+-        __set_lock_level(l->unlock_level);
++        _set_lock_level(l->unlock_level);
+     }
+     spin_unlock_recursive(&l->lock);
+ }
+@@ -194,10 +197,10 @@ static inline void mm_enforce_order_unlock(int unlock_level,
+         BUG_ON(*recurse_count == 0);
+         if ( (*recurse_count)-- == 1 )
+         {
+-            __set_lock_level(unlock_level);
++            _set_lock_level(unlock_level);
+         }
+     } else {
+-        __set_lock_level(unlock_level);
++        _set_lock_level(unlock_level);
+     }
+ }
+ 
+@@ -287,21 +290,24 @@ declare_mm_lock(altp2mlist)
+ 
+ #define MM_LOCK_ORDER_altp2m                 40
+ declare_mm_rwlock(altp2m);
+-#define p2m_lock(p)                             \
+-    do {                                        \
+-        if ( p2m_is_altp2m(p) )                 \
+-            mm_write_lock(altp2m, &(p)->lock);  \
+-        else                                    \
+-            mm_write_lock(p2m, &(p)->lock);     \
+-        (p)->defer_flush++;                     \
+-    } while (0)
+-#define p2m_unlock(p)                           \
+-    do {                                        \
+-        if ( --(p)->defer_flush == 0 )          \
+-            p2m_unlock_and_tlb_flush(p);        \
+-        else                                    \
+-            mm_write_unlock(&(p)->lock);        \
+-    } while (0)
++
++static inline void p2m_lock(struct p2m_domain *p)
++{
++    if ( p2m_is_altp2m(p) )
++        mm_write_lock(altp2m, &p->lock);
++    else
++        mm_write_lock(p2m, &p->lock);
++    p->defer_flush++;
++}
++
++static inline void p2m_unlock(struct p2m_domain *p)
++{
++    if ( --p->defer_flush == 0 )
++        p2m_unlock_and_tlb_flush(p);
++    else
++        mm_write_unlock(&p->lock);
++}
++
+ #define gfn_lock(p,g,o)       p2m_lock(p)
+ #define gfn_unlock(p,g,o)     p2m_unlock(p)
+ #define p2m_read_lock(p)      mm_read_lock(p2m, &(p)->lock)
+-- 
+2.17.2 (Apple Git-113)
+

Added: head/emulators/xen-kernel/files/0003-x86-mm-locks-apply-a-bias-to-lock-levels-for-control.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0003-x86-mm-locks-apply-a-bias-to-lock-levels-for-control.patch	Tue Mar 12 15:02:35 2019	(r495458)
@@ -0,0 +1,319 @@
+From efce89c1df5969486bef82eec05223a4a6522d2d Mon Sep 17 00:00:00 2001
+From: Roger Pau Monne <roger.pau at citrix.com>
+Date: Tue, 12 Mar 2019 12:25:21 +0100
+Subject: [PATCH 3/3] x86/mm-locks: apply a bias to lock levels for control
+ domain
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+paging_log_dirty_op function takes mm locks from a subject domain and
+then attempts to perform copy to operations against the caller domain
+in order to copy the result of the hypercall into the caller provided
+buffer.
+
+This works fine when the caller is a non-paging domain, but triggers a
+lock order panic when the caller is a paging domain due to the fact
+that at the point where the copy to operation is performed the subject
+domain paging lock is locked, and the copy operation requires
+locking the caller p2m lock which has a lower level.
+
+Fix this limitation by adding a bias to the level of control domain mm
+locks, so that the lower control domain mm lock always has a level
+greater than the higher unprivileged domain lock level. This allows
+locking the subject domain mm locks and then locking the control
+domain mm locks, while keeping the same lock ordering and the changes
+mostly confined to mm-locks.h.
+
+Note that so far only this flow (locking a subject domain locks and
+then the control domain ones) has been identified, but not all
+possible code paths have been inspected. Hence this solution attempts
+to be a non-intrusive fix for the problem at hand, without discarding
+further changes in the future if other valid code paths are found that
+require more complex lock level ordering.
+
+Signed-off-by: Roger Pau Monné <roger.pau at citrix.com>
+Reviewed-by: George Dunlap <george.dunlap at citrix.com>
+---
+ xen/arch/x86/mm/mm-locks.h | 119 +++++++++++++++++++++++--------------
+ xen/arch/x86/mm/p2m-pod.c  |   5 +-
+ 2 files changed, 78 insertions(+), 46 deletions(-)
+
+diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
+index d3497713e9..d6c073dc5c 100644
+--- a/xen/arch/x86/mm/mm-locks.h
++++ b/xen/arch/x86/mm/mm-locks.h
+@@ -50,15 +50,35 @@ static inline int _get_lock_level(void)
+     return this_cpu(mm_lock_level);
+ }
+ 
++#define MM_LOCK_ORDER_MAX                    64
++/*
++ * Return the lock level taking the domain bias into account. If the domain is
++ * privileged a bias of MM_LOCK_ORDER_MAX is applied to the lock level, so that
++ * mm locks that belong to a control domain can be acquired after having
++ * acquired mm locks of an unprivileged domain.
++ *
++ * This is required in order to use some hypercalls from a paging domain that
++ * take locks of a subject domain and then attempt to copy data to/from the
++ * caller domain.
++ */
++static inline int _lock_level(const struct domain *d, int l)
++{
++    ASSERT(l <= MM_LOCK_ORDER_MAX);
++
++    return l + (d && is_control_domain(d) ? MM_LOCK_ORDER_MAX : 0);
++}
++
+ /*
+  * If you see this crash, the numbers printed are order levels defined
+  * in this file.
+  */
+-static inline void _check_lock_level(int l)
++static inline void _check_lock_level(const struct domain *d, int l)
+ {
+-    if ( unlikely(_get_lock_level() > l) )
++    int lvl = _lock_level(d, l);
++
++    if ( unlikely(_get_lock_level() > lvl) )
+     {
+-        printk("mm locking order violation: %i > %i\n", _get_lock_level(), l);
++        printk("mm locking order violation: %i > %i\n", _get_lock_level(), lvl);
+         BUG();
+     }
+ }
+@@ -68,10 +88,11 @@ static inline void _set_lock_level(int l)
+     this_cpu(mm_lock_level) = l;
+ }
+ 
+-static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec)
++static inline void _mm_lock(const struct domain *d, mm_lock_t *l,
++                            const char *func, int level, int rec)
+ {
+     if ( !((mm_locked_by_me(l)) && rec) )
+-        _check_lock_level(level);
++        _check_lock_level(d, level);
+     spin_lock_recursive(&l->lock);
+     if ( l->lock.recurse_cnt == 1 )
+     {
+@@ -80,16 +101,17 @@ static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec)
+     }
+     else if ( (unlikely(!rec)) )
+         panic("mm lock already held by %s\n", l->locker_function);
+-    _set_lock_level(level);
++    _set_lock_level(_lock_level(d, level));
+ }
+ 
+-static inline void _mm_enforce_order_lock_pre(int level)
++static inline void _mm_enforce_order_lock_pre(const struct domain *d, int level)
+ {
+-    _check_lock_level(level);
++    _check_lock_level(d, level);
+ }
+ 
+-static inline void _mm_enforce_order_lock_post(int level, int *unlock_level,
+-                                                unsigned short *recurse_count)
++static inline void _mm_enforce_order_lock_post(const struct domain *d, int level,
++                                               int *unlock_level,
++                                               unsigned short *recurse_count)
+ {
+     if ( recurse_count )
+     {
+@@ -100,7 +122,7 @@ static inline void _mm_enforce_order_lock_post(int level, int *unlock_level,
+     } else {
+         *unlock_level = _get_lock_level();
+     }
+-    _set_lock_level(level);
++    _set_lock_level(_lock_level(d, level));
+ }
+ 
+ 
+@@ -117,16 +139,17 @@ static inline int mm_write_locked_by_me(mm_rwlock_t *l)
+     return (l->locker == get_processor_id());
+ }
+ 
+-static inline void _mm_write_lock(mm_rwlock_t *l, const char *func, int level)
++static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
++                                  const char *func, int level)
+ {
+     if ( !mm_write_locked_by_me(l) )
+     {
+-        _check_lock_level(level);
++        _check_lock_level(d, level);
+         percpu_write_lock(p2m_percpu_rwlock, &l->lock);
+         l->locker = get_processor_id();
+         l->locker_function = func;
+         l->unlock_level = _get_lock_level();
+-        _set_lock_level(level);
++        _set_lock_level(_lock_level(d, level));
+     }
+     l->recurse_count++;
+ }
+@@ -141,9 +164,10 @@ static inline void mm_write_unlock(mm_rwlock_t *l)
+     percpu_write_unlock(p2m_percpu_rwlock, &l->lock);
+ }
+ 
+-static inline void _mm_read_lock(mm_rwlock_t *l, int level)
++static inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
++                                 int level)
+ {
+-    _check_lock_level(level);
++    _check_lock_level(d, level);
+     percpu_read_lock(p2m_percpu_rwlock, &l->lock);
+     /* There's nowhere to store the per-CPU unlock level so we can't
+      * set the lock level. */
+@@ -156,28 +180,32 @@ static inline void mm_read_unlock(mm_rwlock_t *l)
+ 
+ /* This wrapper uses the line number to express the locking order below */
+ #define declare_mm_lock(name)                                                 \
+-    static inline void mm_lock_##name(mm_lock_t *l, const char *func, int rec)\
+-    { _mm_lock(l, func, MM_LOCK_ORDER_##name, rec); }
++    static inline void mm_lock_##name(const struct domain *d, mm_lock_t *l,   \
++                                      const char *func, int rec)              \
++    { _mm_lock(d, l, func, MM_LOCK_ORDER_##name, rec); }
+ #define declare_mm_rwlock(name)                                               \
+-    static inline void mm_write_lock_##name(mm_rwlock_t *l, const char *func) \
+-    { _mm_write_lock(l, func, MM_LOCK_ORDER_##name); }                                    \
+-    static inline void mm_read_lock_##name(mm_rwlock_t *l)                    \
+-    { _mm_read_lock(l, MM_LOCK_ORDER_##name); }
++    static inline void mm_write_lock_##name(const struct domain *d,           \
++                                            mm_rwlock_t *l, const char *func) \
++    { _mm_write_lock(d, l, func, MM_LOCK_ORDER_##name); }                     \
++    static inline void mm_read_lock_##name(const struct domain *d,            \
++                                           mm_rwlock_t *l)                    \
++    { _mm_read_lock(d, l, MM_LOCK_ORDER_##name); }
+ /* These capture the name of the calling function */
+-#define mm_lock(name, l) mm_lock_##name(l, __func__, 0)
+-#define mm_lock_recursive(name, l) mm_lock_##name(l, __func__, 1)
+-#define mm_write_lock(name, l) mm_write_lock_##name(l, __func__)
+-#define mm_read_lock(name, l) mm_read_lock_##name(l)
++#define mm_lock(name, d, l) mm_lock_##name(d, l, __func__, 0)
++#define mm_lock_recursive(name, d, l) mm_lock_##name(d, l, __func__, 1)
++#define mm_write_lock(name, d, l) mm_write_lock_##name(d, l, __func__)
++#define mm_read_lock(name, d, l) mm_read_lock_##name(d, l)
+ 
+ /* This wrapper is intended for "external" locks which do not use
+  * the mm_lock_t types. Such locks inside the mm code are also subject
+  * to ordering constraints. */
+-#define declare_mm_order_constraint(name)                                   \
+-    static inline void mm_enforce_order_lock_pre_##name(void)               \
+-    { _mm_enforce_order_lock_pre(MM_LOCK_ORDER_##name); }                               \
+-    static inline void mm_enforce_order_lock_post_##name(                   \
+-                        int *unlock_level, unsigned short *recurse_count)   \
+-    { _mm_enforce_order_lock_post(MM_LOCK_ORDER_##name, unlock_level, recurse_count); } \
++#define declare_mm_order_constraint(name)                                       \
++    static inline void mm_enforce_order_lock_pre_##name(const struct domain *d) \
++    { _mm_enforce_order_lock_pre(d, MM_LOCK_ORDER_##name); }                    \
++    static inline void mm_enforce_order_lock_post_##name(const struct domain *d,\
++                        int *unlock_level, unsigned short *recurse_count)       \
++    { _mm_enforce_order_lock_post(d, MM_LOCK_ORDER_##name, unlock_level,        \
++                                  recurse_count); }
+ 
+ static inline void mm_unlock(mm_lock_t *l)
+ {
+@@ -221,7 +249,7 @@ static inline void mm_enforce_order_unlock(int unlock_level,
+ 
+ #define MM_LOCK_ORDER_nestedp2m               8
+ declare_mm_lock(nestedp2m)
+-#define nestedp2m_lock(d)   mm_lock(nestedp2m, &(d)->arch.nested_p2m_lock)
++#define nestedp2m_lock(d)   mm_lock(nestedp2m, d, &(d)->arch.nested_p2m_lock)
+ #define nestedp2m_unlock(d) mm_unlock(&(d)->arch.nested_p2m_lock)
+ 
+ /* P2M lock (per-non-alt-p2m-table)
+@@ -260,9 +288,10 @@ declare_mm_rwlock(p2m);
+ 
+ #define MM_LOCK_ORDER_per_page_sharing       24
+ declare_mm_order_constraint(per_page_sharing)
+-#define page_sharing_mm_pre_lock()   mm_enforce_order_lock_pre_per_page_sharing()
++#define page_sharing_mm_pre_lock() \
++        mm_enforce_order_lock_pre_per_page_sharing(NULL)
+ #define page_sharing_mm_post_lock(l, r) \
+-        mm_enforce_order_lock_post_per_page_sharing((l), (r))
++        mm_enforce_order_lock_post_per_page_sharing(NULL, (l), (r))
+ #define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r))
+ 
+ /* Alternate P2M list lock (per-domain)
+@@ -275,7 +304,8 @@ declare_mm_order_constraint(per_page_sharing)
+ 
+ #define MM_LOCK_ORDER_altp2mlist             32
+ declare_mm_lock(altp2mlist)
+-#define altp2m_list_lock(d)   mm_lock(altp2mlist, &(d)->arch.altp2m_list_lock)
++#define altp2m_list_lock(d)   mm_lock(altp2mlist, d, \
++                                      &(d)->arch.altp2m_list_lock)
+ #define altp2m_list_unlock(d) mm_unlock(&(d)->arch.altp2m_list_lock)
+ 
+ /* P2M lock (per-altp2m-table)
+@@ -294,9 +324,9 @@ declare_mm_rwlock(altp2m);
+ static inline void p2m_lock(struct p2m_domain *p)
+ {
+     if ( p2m_is_altp2m(p) )
+-        mm_write_lock(altp2m, &p->lock);
++        mm_write_lock(altp2m, p->domain, &p->lock);
+     else
+-        mm_write_lock(p2m, &p->lock);
++        mm_write_lock(p2m, p->domain, &p->lock);
+     p->defer_flush++;
+ }
+ 
+@@ -310,7 +340,7 @@ static inline void p2m_unlock(struct p2m_domain *p)
+ 
+ #define gfn_lock(p,g,o)       p2m_lock(p)
+ #define gfn_unlock(p,g,o)     p2m_unlock(p)
+-#define p2m_read_lock(p)      mm_read_lock(p2m, &(p)->lock)
++#define p2m_read_lock(p)      mm_read_lock(p2m, (p)->domain, &(p)->lock)
+ #define p2m_read_unlock(p)    mm_read_unlock(&(p)->lock)
+ #define p2m_locked_by_me(p)   mm_write_locked_by_me(&(p)->lock)
+ #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
+@@ -322,7 +352,7 @@ static inline void p2m_unlock(struct p2m_domain *p)
+ 
+ #define MM_LOCK_ORDER_pod                    48
+ declare_mm_lock(pod)
+-#define pod_lock(p)           mm_lock(pod, &(p)->pod.lock)
++#define pod_lock(p)           mm_lock(pod, (p)->domain, &(p)->pod.lock)
+ #define pod_unlock(p)         mm_unlock(&(p)->pod.lock)
+ #define pod_locked_by_me(p)   mm_locked_by_me(&(p)->pod.lock)
+ 
+@@ -335,8 +365,9 @@ declare_mm_lock(pod)
+ 
+ #define MM_LOCK_ORDER_page_alloc             56
+ declare_mm_order_constraint(page_alloc)
+-#define page_alloc_mm_pre_lock()   mm_enforce_order_lock_pre_page_alloc()
+-#define page_alloc_mm_post_lock(l) mm_enforce_order_lock_post_page_alloc(&(l), NULL)
++#define page_alloc_mm_pre_lock(d)  mm_enforce_order_lock_pre_page_alloc(d)
++#define page_alloc_mm_post_lock(d, l) \
++        mm_enforce_order_lock_post_page_alloc(d, &(l), NULL)
+ #define page_alloc_mm_unlock(l)    mm_enforce_order_unlock((l), NULL)
+ 
+ /* Paging lock (per-domain)
+@@ -356,9 +387,9 @@ declare_mm_order_constraint(page_alloc)
+ 
+ #define MM_LOCK_ORDER_paging                 64
+ declare_mm_lock(paging)
+-#define paging_lock(d)         mm_lock(paging, &(d)->arch.paging.lock)
++#define paging_lock(d)         mm_lock(paging, d, &(d)->arch.paging.lock)
+ #define paging_lock_recursive(d) \
+-                    mm_lock_recursive(paging, &(d)->arch.paging.lock)
++                    mm_lock_recursive(paging, d, &(d)->arch.paging.lock)
+ #define paging_unlock(d)       mm_unlock(&(d)->arch.paging.lock)
+ #define paging_locked_by_me(d) mm_locked_by_me(&(d)->arch.paging.lock)
+ 
+diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
+index 631e9aec33..725a2921d9 100644
+--- a/xen/arch/x86/mm/p2m-pod.c
++++ b/xen/arch/x86/mm/p2m-pod.c
+@@ -34,9 +34,10 @@
+ /* Enforce lock ordering when grabbing the "external" page_alloc lock */
+ static inline void lock_page_alloc(struct p2m_domain *p2m)
+ {
+-    page_alloc_mm_pre_lock();
++    page_alloc_mm_pre_lock(p2m->domain);
+     spin_lock(&(p2m->domain->page_alloc_lock));
+-    page_alloc_mm_post_lock(p2m->domain->arch.page_alloc_unlock_level);
++    page_alloc_mm_post_lock(p2m->domain,
++                            p2m->domain->arch.page_alloc_unlock_level);
+ }
+ 
+ static inline void unlock_page_alloc(struct p2m_domain *p2m)
+-- 
+2.17.2 (Apple Git-113)
+

Added: head/emulators/xen-kernel/files/xsa284.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/xsa284.patch	Tue Mar 12 15:02:35 2019	(r495458)
@@ -0,0 +1,31 @@
+From: Jan Beulich <jbeulich at suse.com>
+Subject: gnttab: set page refcount for copy-on-grant-transfer
+
+Commit 5cc77f9098 ("32-on-64: Fix domain address-size clamping,
+implement"), which introduced this functionality, took care of clearing
+the old page's PGC_allocated, but failed to set the bit (and install the
+associated reference) on the newly allocated one. Furthermore the "mfn"
+local variable was never updated, and hence the wrong MFN was passed to
+guest_physmap_add_page() (and back to the destination domain) in this
+case, leading to an IOMMU mapping into an unowned page.
+
+Ideally the code would use assign_pages(), but the call to
+gnttab_prepare_for_transfer() sits in the middle of the actions
+mirroring that function.
+
+This is XSA-284.
+
+Signed-off-by: Jan Beulich <jbeulich at suse.com>
+Acked-by: George Dunlap <george.dunlap at citrix.com>
+
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -2183,6 +2183,8 @@ gnttab_transfer(
+             page->count_info &= ~(PGC_count_mask|PGC_allocated);
+             free_domheap_page(page);
+             page = new_page;
++            page->count_info = PGC_allocated | 1;
++            mfn = page_to_mfn(page);
+         }
+ 
+         spin_lock(&e->page_alloc_lock);

Added: head/emulators/xen-kernel/files/xsa287-4.11.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/xsa287-4.11.patch	Tue Mar 12 15:02:35 2019	(r495458)
@@ -0,0 +1,328 @@
+From 67620c1ccb13f7b58645f48248ba1f408b021fdc Mon Sep 17 00:00:00 2001
+From: George Dunlap <george.dunlap at citrix.com>
+Date: Fri, 18 Jan 2019 15:00:34 +0000
+Subject: [PATCH] steal_page: Get rid of bogus struct page states
+
+The original rules for `struct page` required the following invariants
+at all times:
+
+- refcount > 0 implies owner != NULL
+- PGC_allocated implies refcount > 0
+
+steal_page, in a misguided attempt to protect against unknown races,
+violates both of these rules, thus introducing other races:
+
+- Temporarily, the count_info has the refcount go to 0 while
+  PGC_allocated is set
+
+- It explicitly returns the page PGC_allocated set, but owner == NULL
+  and page not on the page_list.
+
+The second one meant that page_get_owner_and_reference() could return
+NULL even after having successfully grabbed a reference on the page,
+leading the caller to leak the reference (since "couldn't get ref" and
+"got ref but no owner" look the same).
+
+Furthermore, rather than grabbing a page reference to ensure that the
+owner doesn't change under its feet, it appears to rely on holding
+d->page_alloc lock to prevent this.
+
+Unfortunately, this is ineffective: page->owner remains non-NULL for
+some time after the count has been set to 0; meaning that it would be
+entirely possible for the page to be freed and re-allocated to a
+different domain between the page_get_owner() check and the count_info
+check.
+
+Modify steal_page to instead follow the appropriate access discipline,
+taking the page through series of states similar to being freed and
+then re-allocated with MEMF_no_owner:
+
+- Grab an extra reference to make sure we don't race with anyone else
+  freeing the page
+
+- Drop both references and PGC_allocated atomically, so that (if
+successful), anyone else trying to grab a reference will fail
+
+- Attempt to reset Xen's mappings
+
+- Reset the rest of the state.
+
+Then, modify the two callers appropriately:
+
+- Leave count_info alone (it's already been cleared)
+- Call free_domheap_page() directly if appropriate
+- Call assign_pages() rather than open-coding a partial assign
+
+With all callers to assign_pages() now passing in pages with the
+type_info field clear, tighten the respective assertion there.
+
+This is XSA-287.
+
+Signed-off-by: George Dunlap <george.dunlap at citrix.com>
+Signed-off-by: Jan Beulich <jbeulich at suse.com>
+---
+ xen/arch/x86/mm.c        | 84 ++++++++++++++++++++++++++++------------
+ xen/common/grant_table.c | 20 +++++-----
+ xen/common/memory.c      | 19 +++++----
+ xen/common/page_alloc.c  |  2 +-
+ 4 files changed, 83 insertions(+), 42 deletions(-)
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index 6509035a5c..d8ff58c901 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -3966,70 +3966,106 @@ int donate_page(
+     return -EINVAL;
+ }
+ 
++/*
++ * Steal page will attempt to remove `page` from domain `d`.  Upon
++ * return, `page` will be in a state similar to the state of a page
++ * returned from alloc_domheap_page() with MEMF_no_owner set:
++ * - refcount 0
++ * - type count cleared
++ * - owner NULL
++ * - page caching attributes cleaned up
++ * - removed from the domain's page_list
++ *
++ * If MEMF_no_refcount is not set, the domain's tot_pages will be
++ * adjusted.  If this results in the page count falling to 0,
++ * put_domain() will be called.
++ *
++ * The caller should either call free_domheap_page() to free the
++ * page, or assign_pages() to put it back on some domain's page list.
++ */
+ int steal_page(
+     struct domain *d, struct page_info *page, unsigned int memflags)
+ {
+     unsigned long x, y;
+     bool drop_dom_ref = false;
+-    const struct domain *owner = dom_xen;
++    const struct domain *owner;
++    int rc;
+ 
+     if ( paging_mode_external(d) )
+         return -EOPNOTSUPP;
+ 
+-    spin_lock(&d->page_alloc_lock);
+-
+-    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) )
++    /* Grab a reference to make sure the page doesn't change under our feet */
++    rc = -EINVAL;
++    if ( !(owner = page_get_owner_and_reference(page)) )
+         goto fail;
+ 
++    if ( owner != d || is_xen_heap_page(page) )
++        goto fail_put;
++
+     /*
+-     * We require there is just one reference (PGC_allocated). We temporarily
+-     * drop this reference now so that we can safely swizzle the owner.
++     * We require there are exactly two references -- the one we just
++     * took, and PGC_allocated. We temporarily drop both these
++     * references so that the page becomes effectively non-"live" for

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***