mdconfig creating file based memory disk
Chad J. Milios
milios at ccsys.com
Thu Sep 10 17:53:27 UTC 2015
> On Sep 9, 2015, at 11:10 PM, Erich Dollansky <erichsfreebsdlist at alogt.com> wrote:
> I just came across a simple question. What will happen when I create
> two memory disks using the same file?
> mdconfig -f /usr/home/swap/swapfile -u 0
> mdconfig -f /usr/home/swap/swapfile -u 1
> and then I do a
> swapon /dev/md0
> swapon /dev/md1
> It gives me double the size of 'swapfile' as swap space. It is obvious
> to me that this must fail.
> Shouldn't there be a note in the documentation?
Perhaps, but if we documented every way in which FreeBSD allows one to shoot oneself in the foot, the docs would probably more than triple in size. :)
This is an interesting experiment but I can't imagine anyone inviting the danger while actually expecting to get away with such a configuration and I don't imagine happening onto it by accident any more likely than the other infinite potentially dangerous misconfigurations of *nix. I doubt this merits a mention for safety's sake, though as an illustration of how swap actually works internally it has a lot of merit. I'd be curious to see more thorough test results and discussion from those with intimate knowledge of the virtual memory and swapper/pager systems.
Imagine the following analog: a hypothetical database software which mmap()s a file possibly larger than physical memory to rely on the VM system for demand paging. Now imagine two or more instances of the database software being started with hard links to the same underlying file and both/all are allowed to read and write. If the software is SMP-capable and uses locks or data structures WITHIN the mapped region to handle synchronization (and doesn't go out of its way to in-and-of-itself cache/process the data (beyond the help the kernel already provides) outside that region for moments during which the data could become stale) then the multiple instances could all serve data from, AND modify data in, that same single source of truth and will remain stable and in-sync even without msync()ing to the underlying file or storage. I'm also positive this holds true though any (or an arbitrary and very large) number/combination of indirections through hardlinks, symlinks, mdconfig, nullfs and/or unionfs (or it intends to, so any failure or race should be considered a kernel bug).
So without inspecting the relevant kernel source myself, based on the little experiment you've conducted, I can imagine the swap perhaps having been set up in a way that the data structure(s) that map swapped regions is either fully inside or fully outside the swap partition/file in a way in which any "surprise" data showing up in the "other" swap device (besides the one it was written to) ends up being non-problematic. I am just brainstorming here and would love it if someone with knowledge rather than conjecture chimes in. :)
On the outset of the experiment you describe, my expectation was almost certain spectacular failure. Anything else actually is quite curious and if such a config doesn't just burst right into flames I consider it quite a testament to sound *nix engineering. I'd be interested to hear someone exercise it with more swapping out and paging in of data and verifying the data and semantics.
More information about the freebsd-questions