mlock & COW

Sat Apr 12 20:50:42 UTC 2008

:I've been battling with a bug related to mlock and COW pages.  Since I'm 
:basically clueless when it comes to the VM subsystem, I was hoping someone 
:with more clue can look at my fix and let me know if I'm doing the right 
:thing here.
:
:The problem: User programs will crash (SEGV/BUS) if COW pages become 
:writable after the pages are mlocked.  This happens if shared libraries is 
:loaded in a program after a call to mlockall(MCL_FUTURE), and can be seen 
:with amd and ntpd when nss_ldap is used in the system.  Included at the end 
:of this message is a sample program that demonstrates the problem.
:
:The "solution": Forcibly wire the page via vm_fault_wire() when the page 
:protection bits are changed.  This seems to make the problem disappear, but 
:I'm not sure if causing a fault on vm_map_protect() is a good thing to do.  
:Am I correct in thinking vm_fault_wire() will cause the COW page to be 
:copied immediately?  I think this is the right thing to do, given the page 
:is supposed to be wired, but I'm not sure if somewhere in the vm fault 
:routines might be a better place to put the fix.  I'm also not sure what 
:the last argument to vm_fault_wire() (fictitious) means, I just copied the 
:argument from another invocation.  My patch (against 7-STABLE) is included 
:below.
:
:I'll commit this if someone blesses the fix as the right thing.
:
:
:Index: sys/vm/vm_map.c
:===================================================================
:RCS file: /home/ncvs/src/sys/vm/vm_map.c,v
:retrieving revision 1.388.2.3
:diff -u -p -r1.388.2.3 vm_map.c
:--- sys/vm/vm_map.c	18 Jan 2008 10:02:53 -0000	1.388.2.3
:+++ sys/vm/vm_map.c	9 Mar 2008 20:55:50 -0000
:@@ -1621,6 +1621,15 @@ vm_map_protect(vm_map_t map, vm_offset_t
: 			    current->end,
: 			    current->protection & MASK(current));
: #undef	MASK
:+			if ((entry->eflags & 
:+			    (MAP_ENTRY_USER_WIRED|MAP_ENTRY_COW)) ==
:+			    (MAP_ENTRY_USER_WIRED|MAP_ENTRY_COW)) {
:+				vm_fault_wire(map, current->start,
:+				    current->end, TRUE,
:+				    entry->object.vm_object != NULL &&
:+				    entry->object.vm_object->type ==
:+				    OBJT_DEVICE);
:+			}
: 		}
: 		vm_map_simplify_entry(map, current);
: 		current = current->next;

    Hmm.  mlock() will COW pages in writeable maps before locking the pages,
    so you are on the right track with regards to what to do if a read-only
    map is locked first, and then made writeable later.  Here's mlock's
    flow:

    mlock() calls vm_map_wire():

	* lookup entry
	* mark entry in transition
	* bump entry wired count
	* unlock map
	* vm_fault_wire	<--- this does the COW
	* relock map

    vm_map_protect's flow:

	* lookup entry
	  [ entry not marked in transition ]
	* adjust pmap protections (if COW forces read-only)
	  [ This is where your patch currently is ]

    I don't think your patch is going to work as-is.  The pages are already
    wired read-only at this point in your bug reproduction sequence:

	* mmap
	* protect read-only
	* mlock
	* protect read-write  <----- this point pages already wired RO
	* modify memory

    Here are some likely issues:

    * You can't issue the vm_fault_wire() unless the entry is marked
      in-transition, otherwise the entry can get ripped out from under you.

    * When you call vm_fault_wire() it will COW the page, but the *ORIGINAL*
      read-only page will still have been marked wired and will probably
      wind up being out of sync with the related map entries.

      I don't think vm_fault_wire() was designed to deal with that situation
      so its likely you will get a wiring leak.

    I think your only choice is to first do a wholesale unwiring of the 
    pages (equivalent to munlock()), and then re-lock them after the protection
    change.  It should be possible to do this from inside vm_map_protect()
    as long as you properly handle the vm_map_entry.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>