On COW memory mapping in d_mmap_single
Konstantin Belousov
kostikbel at gmail.com
Tue Apr 11 13:00:21 UTC 2017
On Tue, Apr 11, 2017 at 03:37:26PM +0300, Flavius Anton wrote:
> Hi everyone,
>
> I'll start by giving some context, so you can better understand what
> is the problem I'm trying to solve. I???ve been working for a while on
> bhyve trying to implement save/restore [1]. We've currently managed to
> get it working for VMs using a ramdisk and no devices, so just vCPU
> and memory states are saved and restored so far.
>
> Last week I started looking into network devices, specifically
> virtio-net devices. The problem was that when I issue a checkpoint
> operation, the guest virtio driver stops working. After digging for a
> while, I figured out the problem is marking VM memory as COW. If I
> don't do this, the driver continues with no problem after
> checkpointing.
>
> Each VM has an associated vmspace and a /dev/vmm/VM_NAME device. When
> the user space does a mmap on the /dev device, we would like to mark
> VM memory as COW, thus the VM can continue touching pages while the
> user space is writing the 'freezed', COW marked memory to a persistent
> storage. We do this by iterating through all vm_entries from VM's
> vmspace, we find which entry is mapping the object that has VM memory
> and then we roughly just set MAP_ENTRY_COW and MAP_ENTRY_NEEDS_COPY on
> that entry. You can see the code here [2].
This is very strange operation, to put it mildly. First, are other vCPUs
operate while you do your 'COW' ? If yes, you are guaranteed to get
inconsistent snapshot. If not, then you do not need 'COW'.
More, what kinds of VM objects are mapped into the vmspace ? FreeBSD VM
does not support shadowing of device objects (which means, inserting
shadow objects into the device object chain breaks VM invariants). One
of the main reasons why it not needed to be supported is because shadow
copy cannot see changes which are performed on the shadowed pages,
supposedly done by device. If vmm mmaps some devices into guest vmspace,
the devices would kind of 'freeze' from the guest PoV.
Next, how do you undo the damage done by your 'COW' ?
> I'm not sure if the above is sufficient for our purpose. In other
> words, how would you do this? You have a vm_object that is referenced
> via a vm_entry by process A (the user space). Somebody else, process B
> let's say, does an mmap() on your device and you'd like to freeze that
> object, such that process B can see a consistent snapshot of it, while
> you want process A to be able to continue reading and writing from/to
> it.
This is not supported. I have no idea why would a copy of a page which
reflects the device state even considered as a good idea. But you cannot
make the consistent copy without device cooperation anyway, since device
might modify its state while CPU reads.
>
> I've also read through Design Elements of the FreeBSD VM system [3],
> but I am still afraid (I am sure) that I have some misunderstandings.
>
> Thank you very much for bearing with me and going through this wall of text.
>
> --
> Flavius
>
> [1] https://github.com/flaviusanton/freebsd/tree/bhyve-save-restore
> [2] https://github.com/flaviusanton/freebsd/blob/bhyve-save-restore/sys/amd64/vmm/vmm_dev.c#L862
> [3] https://www.freebsd.org/doc/en/articles/vm-design/index.html
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
More information about the freebsd-hackers
mailing list