Re: madvise(MADV_FREE) doesn't work in some cases?

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Mon, 5 Jul 2021 21:54:58 +0300
On Mon, Jul 05, 2021 at 07:32:00PM +0300, Vitaliy Gusev wrote:
> Hi,
> > > Does it mean madvise() doesn't work well in FreeBSD or test does something wrong?
> > 
> > Your program does not exactly what you described above.  There is a generic
> > race to consume memory, and some specific details about madvise(2) on FreeBSD.
> > 
> > >From the code, you do:
> > - mmap anonymous private region
> > - fork
> > - both child and parent start touching the mmaped region.
> > 
> > Two processes race to consume 1/2 of RAM on your system.  If one of
> > them happen to execute faster then another, you do get to the case where
> > one of them does madvise().  But it could be that processes execute in
> > lockstep, and try to eat all the memory before going to madvise().
> > Did you excluded this case?
> I believe I did all things right. You can see sleeps that serialise execution. To check again I modified test and added time printing and use MADV_DONTNEED:
> 
> Here is source  http://cpp.sh/2rd4f <http://cpp.sh/2rd4f>
> 
> I’ve run: 
> 
> $ ./mmapfork 2300
> mmap 0x801000000 pid 40628
> end 0x890c00000 len 0x8fc00000
> pid 40628
> pid 40629
> 40629: [1625500831] touch
> 40629: [1625500832] sleep before madvise
> 40629: [1625500833] madvise
> 40629: [1625500834] Press enter to exit
> 40628: [1625500845] touch
> 40628: [1625500846] sleep before madvise
> 40628: [1625500851] madvise
> 40628: [1625500852] Press enter to exit
> 
> And you can see that child started running in 11 seconds after parent had already called madvise() for all scope of touched memory.
> 
> And finally in dmesg:
> 
> pid 40629 (mmapfork), jid 0, uid 1001, was killed: out of swap space
> 
> So the same result as I wrote in the first email.
> 
> > Now, about the specific of madvise(MADV_FREE) on FreeBSD.  Due to the way
> > CoW is implemented with the shadow chain of objects, we cannot drop the
> > top of the shadow chain, otherwise instead of returning zeroed pages next
> > time, we would return content back in the time.  It was relatively recent
> > discovery, see bf5661f4a1af6931ec4b6, PR 240061.
> > 
> Thanks, I will look at it.
> > To explain it in simplified form, when there is potential old content
> > under the CoW copy for the mapping, we cannot drop CoW-ed pages. This
> > is the motivation why madvise(MADV_FREE) does nothing for your program.
> > When you run two instances without fork, there is no previous content
> > and no Cow, so madvise() can safely remove the pages from the object,
> > and on the next access they are zero-filled.
> 
> Do I understand right, that it should work with MADV_DONTNEED? But “dontneed" variant doesn’t work. 

DONTNEED does not allow system to free pages at all.  It means that pages
are less useful and can be paged out with higher priority.

> > 
> > You can read more details in the referenced commit, as well as some musings
> > about way to make it somewhat better.
> > 
> > I must say, that trying to allocated 1/2 + 1/2 of RAM this way, on a system
> > without swap, is the way to ask for troubles anyway.
> I’ve just notify that other operation systems work well with that, whereas FreeBSD has troubles. Probably something in madvise() is not finished ?

Well, yes, as I said, non-trivial shadow chains for MADV_FREE are not
handled due to the 'old content revival' bug.  For your specific case, the
following patch might help (modulo bugs).

But it is very specific for your example, for instance it would not work
if you try to mark not the whole mapped area as _FREE, but only some
significant part of it. We would need to start fragmenting map to handle
such partial madvises better.

commit 0392eb3c93b7dacc31dbdf8ec2fc40fa5ba67c62
Author: Konstantin Belousov <kib_at_FreeBSD.org>
Date:   Mon Jul 5 21:53:22 2021 +0300

    madvise(MADV_FREE): try harder to handle shadow chain
    
    In particular, collapse top object and see if there is no backing object
    after, which means that we would not revert to older content if drop the
    top object.

diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
index 1ac4ccf72f11..80abac223f29 100644
--- a/sys/vm/vm_map.c
+++ b/sys/vm/vm_map.c
_at__at_ -3033,6 +3033,7 _at__at_ vm_map_madvise(
 			entry = vm_map_entry_succ(entry);
 		for (; entry->start < end;
 		    entry = vm_map_entry_succ(entry)) {
+			vm_object_t obj;
 			vm_offset_t useEnd, useStart;
 
 			if ((entry->eflags & MAP_ENTRY_IS_SUB_MAP) != 0)
_at__at_ -3046,9 +3047,16 _at__at_ vm_map_madvise(
 			 * backing object can change.
 			 */
 			if (behav == MADV_FREE &&
-			    entry->object.vm_object != NULL &&
-			    entry->object.vm_object->backing_object != NULL)
-				continue;
+			    (obj = entry->object.vm_object) != NULL &&
+			    obj->backing_object != NULL) {
+				VM_OBJECT_WLOCK(obj);
+				if ((obj->flags & OBJ_DEAD) != 0)
+					continue;
+				vm_object_collapse(obj);
+				VM_OBJECT_WUNLOCK(obj);
+				if (obj->backing_object != NULL)
+					continue;
+			}
 
 			pstart = OFF_TO_IDX(entry->offset);
 			pend = pstart + atop(entry->end - entry->start);
Received on Mon Jul 05 2021 - 18:54:58 UTC

Original text of this message