Re: madvise(MADV_FREE) doesn't work in some cases?

From: Vitaliy Gusev <gusev.vitaliy_at_gmail.com>
Date: Mon, 05 Jul 2021 16:54:00 UTC
Hi,


> On 3 Jul 2021, at 14:35, Konstantin Belousov <kostikbel@gmail.com> wrote:
> 
> On Sat, Jul 03, 2021 at 02:32:01AM +0300, Vitaliy Gusev wrote:
>> ...
>> Does it mean madvise() doesn't work well in FreeBSD or test does something wrong?
> 
> Your program does not exactly what you described above.  There is a generic
> race to consume memory, and some specific details about madvise(2) on FreeBSD.
> 
> From the code, you do:
> - mmap anonymous private region
> - fork
> - both child and parent start touching the mmaped region.

Their execution should be serialised by sleeps. Yes it is not fully fair, but for testing purpose is enough.


> Two processes race to consume 1/2 of RAM on your system.  If one of
> them happen to execute faster then another, you do get to the case where
> one of them does madvise().  But it could be that processes execute in
> lockstep, and try to eat all the memory before going to madvise().
> Did you excluded this case?
> 

I believe I did all things right. You can see sleeps that serialise execution. To check again I modified test and added time printing and MADV_DONTNEED:

Here is source  http://cpp.sh/2rd4f <http://cpp.sh/2rd4f> and I put it at the end of this email.

I’ve run: 

$ ./mmapfork 2300
mmap 0x801000000 pid 40628
end 0x890c00000 len 0x8fc00000
pid 40628
pid 40629
40629: [1625500831] touch
40629: [1625500832] sleep before madvise
40629: [1625500833] madvise
40629: [1625500834] Press enter to exit
40628: [1625500845] touch
40628: [1625500846] sleep before madvise
40628: [1625500851] madvise
40628: [1625500852] Press enter to exit

And you can see that child (40628) started running in 11 seconds after parent had already called madvise() for all scope of touched memory.

And finally in dmesg:

pid 40629 (mmapfork), jid 0, uid 1001, was killed: out of swap space

So the same result as I wrote in the first email.


> Now, about the specific of madvise(MADV_FREE) on FreeBSD.  Due to the way
> CoW is implemented with the shadow chain of objects, we cannot drop the
> top of the shadow chain, otherwise instead of returning zeroed pages next
> time, we would return content back in the time.  It was relatively recent
> discovery, see bf5661f4a1af6931ec4b6, PR 240061.
> 

Thanks, I will look at it.

> To explain it in simplified form, when there is potential old content
> under the CoW copy for the mapping, we cannot drop CoW-ed pages. This
> is the motivation why madvise(MADV_FREE) does nothing for your program.
> When you run two instances without fork, there is no previous content
> and no Cow, so madvise() can safely remove the pages from the object,
> and on the next access they are zero-filled.
> 

Do I understand right, that it should work with MADV_DONTNEED? But “dontneed" variant doesn’t work. 

> You can read more details in the referenced commit, as well as some musings
> about way to make it somewhat better.
> 
> I must say, that trying to allocated 1/2 + 1/2 of RAM this way, on a system
> without swap, is the way to ask for troubles anyway.



I’ve just notify that other operation systems work well with that, whereas FreeBSD has troubles.

Probably something in madvise() has not been finished yet?

——

#include <sys/mman.h>
#include <err.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <time.h>

int main(int argc, char *argv[])
{
        size_t len = (size_t)(argc > 1 ? atoi(argv[1]) : 1024) * 1024 * 1024;
        uint8_t *ptr, *end, *p;
        unsigned pagesz = 1<<12;
        int pid;

        ptr = (uint8_t *)mmap(NULL, len, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
        if (ptr == MAP_FAILED)
                err(1, "cannot mmap");

        end = ptr + len;
        printf("mmap %p pid %d\n", ptr, getpid());
        printf("end %p len %#lx\n", end, len);

        fflush(stdout);

        pid = fork();

        if (pid < 0)
                err(1, "cannot fork");

        printf("pid %d\n", getpid());

        sleep(pid == 0 ? 1 : 15);

        printf("%d: [%ld] touch\n", getpid(), time(NULL));

        p = ptr;
        while (p < end) {
                *p = 1;
                p += pagesz;
        }

        printf("%d: [%ld] sleep before madvise\n", getpid(), time(NULL));
        sleep(pid == 0 ? 1 : 5);
        printf("%d: [%ld] madvise\n", getpid(), time(NULL));

        p = ptr;
        while (p < end) {
                int error;

                error = madvise(p, pagesz, MADV_DONTNEED);
                if (error) {
                        err(1, "cannot madvise");
                }
                p += pagesz;
        }

        printf("%d: [%ld] Press enter to exit\n", getpid(), time(NULL));
        getchar();
}