From nobody Mon Jul 05 16:54:00 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 19AAF8D3861 for ; Mon, 5 Jul 2021 16:54:04 +0000 (UTC) (envelope-from gusev.vitaliy@gmail.com) Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GJWx66NXxz4pD3; Mon, 5 Jul 2021 16:54:02 +0000 (UTC) (envelope-from gusev.vitaliy@gmail.com) Received: by mail-lf1-x135.google.com with SMTP id t17so33584089lfq.0; Mon, 05 Jul 2021 09:54:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=7RbS79zOh28NSP6NtNY3J+45gcb8l2A9weTYu8bw+XE=; b=CD2ETdUmwwwgwtYBd7iFjO2q8QHTJNcgEnqNfrzkkbeKw3B3WpnJ0sb0yfhljaCQnz UHfJvTh8YiftLZUmYeI39hrZIxr3iLu/xbksckPzbYQs2zNi2kjGWAs/7kEQjLoO/Je+ 2ls1Jx0K9R+yLqoPhZQe/SBc3y6Y0el4t998XZqY4J3ExEd9ZGJihqeqcc8O8C4TFeBT u+3nNQSdv0IR7kbZKcR+OCo2u4WafO2CNscZ0xsNBXY1N9Nn+YSmyiB2s1AnvPLlrb3P vWDoGGMsacjVJbZdSC7/HlZrRW2J5RKTXUAD/7+7iiYJfRv8QUle+tlWCRLOamZ9jPY1 5IKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=7RbS79zOh28NSP6NtNY3J+45gcb8l2A9weTYu8bw+XE=; b=jM5iG41/XGDX4Bo75rSm7hI5zLWEcyq1/eGjw9tPwzjINwnRxfRsT9uWG9WFAjxr/u lAawefRoUBX7Z9A1Eal/6q8NxA8YprldGq9DtAgfQLzKu/7LWEjkWA5KdvEB0ZL3OAZN 78IhLyxwHng9zFJjvtnN9DYkxZusEtv8+y6bXL4FohUEDmF5i2I7P4fLdD+VCxT1g8S/ 3Oe6TVQDg/BPZlkHwt06pl7OptfMmwX1H20u9Hb9J6I0sd0KgxJfhBRlfzfRON0k3cBB +AkMZgXVGwk1GTdDVgVra8fw/tTLyawdyrh1vZSVSJEz/1lShPdiSadIowYacTAyebrb duCg== X-Gm-Message-State: AOAM533mYu5xnN5mYEvUV3zMhCS9jPEu3i+Vu8iVqYZew21A/z2yU/+B PsL7fPWr0ifzZypAcFS1U3E= X-Google-Smtp-Source: ABdhPJwKsBGVJOFVpXOs0usbR3EspZ2kBbl1IXsfBukEWgnP4Ki4Vf07t2D9fDIJgpbSG0PHEkEFOQ== X-Received: by 2002:a19:750a:: with SMTP id y10mr7030105lfe.383.1625504041379; Mon, 05 Jul 2021 09:54:01 -0700 (PDT) Received: from [10.42.0.5] ([188.187.60.230]) by smtp.gmail.com with ESMTPSA id x17sm1133116lfn.187.2021.07.05.09.54.00 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 05 Jul 2021 09:54:01 -0700 (PDT) From: Vitaliy Gusev Message-Id: <57BCE463-6200-4F83-A321-2F0444E7F063@gmail.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_3BEFE3ED-1F3C-4A6B-AAC6-021F01426D22" List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Subject: Re: madvise(MADV_FREE) doesn't work in some cases? Date: Mon, 5 Jul 2021 19:54:00 +0300 In-Reply-To: Cc: freebsd-hackers@freebsd.org, Mark Johnston To: Konstantin Belousov References: <0A95973D-254A-4574-8DC7-9F515F60B873@gmail.com> X-Mailer: Apple Mail (2.3608.120.23.2.7) X-Rspamd-Queue-Id: 4GJWx66NXxz4pD3 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=CD2ETdUm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of gusevvitaliy@gmail.com designates 2a00:1450:4864:20::135 as permitted sender) smtp.mailfrom=gusevvitaliy@gmail.com X-Spamd-Result: default: False [-2.25 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; MV_CASE(0.50)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; SUBJECT_ENDS_QUESTION(1.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MID_RHS_MATCH_FROM(0.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; ARC_NA(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a00:1450:4864:20::135:from]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; RECEIVED_SPAMHAUS_PBL(0.00)[188.187.60.230:received]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-0.75)[-0.753]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; SPAMHAUS_ZRD(0.00)[2a00:1450:4864:20::135:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::135:from]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-hackers] X-ThisMailContainsUnwantedMimeParts: Y --Apple-Mail=_3BEFE3ED-1F3C-4A6B-AAC6-021F01426D22 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi, > On 3 Jul 2021, at 14:35, Konstantin Belousov = wrote: >=20 > On Sat, Jul 03, 2021 at 02:32:01AM +0300, Vitaliy Gusev wrote: >> ... >> Does it mean madvise() doesn't work well in FreeBSD or test does = something wrong? >=20 > Your program does not exactly what you described above. There is a = generic > race to consume memory, and some specific details about madvise(2) on = FreeBSD. >=20 > =46rom the code, you do: > - mmap anonymous private region > - fork > - both child and parent start touching the mmaped region. Their execution should be serialised by sleeps. Yes it is not fully = fair, but for testing purpose is enough. > Two processes race to consume 1/2 of RAM on your system. If one of > them happen to execute faster then another, you do get to the case = where > one of them does madvise(). But it could be that processes execute in > lockstep, and try to eat all the memory before going to madvise(). > Did you excluded this case? >=20 I believe I did all things right. You can see sleeps that serialise = execution. To check again I modified test and added time printing and = MADV_DONTNEED: Here is source http://cpp.sh/2rd4f and I put it = at the end of this email. I=E2=80=99ve run:=20 $ ./mmapfork 2300 mmap 0x801000000 pid 40628 end 0x890c00000 len 0x8fc00000 pid 40628 pid 40629 40629: [1625500831] touch 40629: [1625500832] sleep before madvise 40629: [1625500833] madvise 40629: [1625500834] Press enter to exit 40628: [1625500845] touch 40628: [1625500846] sleep before madvise 40628: [1625500851] madvise 40628: [1625500852] Press enter to exit And you can see that child (40628) started running in 11 seconds after = parent had already called madvise() for all scope of touched memory. And finally in dmesg: pid 40629 (mmapfork), jid 0, uid 1001, was killed: out of swap space So the same result as I wrote in the first email. > Now, about the specific of madvise(MADV_FREE) on FreeBSD. Due to the = way > CoW is implemented with the shadow chain of objects, we cannot drop = the > top of the shadow chain, otherwise instead of returning zeroed pages = next > time, we would return content back in the time. It was relatively = recent > discovery, see bf5661f4a1af6931ec4b6, PR 240061. >=20 Thanks, I will look at it. > To explain it in simplified form, when there is potential old content > under the CoW copy for the mapping, we cannot drop CoW-ed pages. This > is the motivation why madvise(MADV_FREE) does nothing for your = program. > When you run two instances without fork, there is no previous content > and no Cow, so madvise() can safely remove the pages from the object, > and on the next access they are zero-filled. >=20 Do I understand right, that it should work with MADV_DONTNEED? But = =E2=80=9Cdontneed" variant doesn=E2=80=99t work.=20 > You can read more details in the referenced commit, as well as some = musings > about way to make it somewhat better. >=20 > I must say, that trying to allocated 1/2 + 1/2 of RAM this way, on a = system > without swap, is the way to ask for troubles anyway. I=E2=80=99ve just notify that other operation systems work well with = that, whereas FreeBSD has troubles. Probably something in madvise() has not been finished yet? =E2=80=94=E2=80=94 #include #include #include #include #include #include #include int main(int argc, char *argv[]) { size_t len =3D (size_t)(argc > 1 ? atoi(argv[1]) : 1024) * 1024 = * 1024; uint8_t *ptr, *end, *p; unsigned pagesz =3D 1<<12; int pid; ptr =3D (uint8_t *)mmap(NULL, len, PROT_WRITE | PROT_READ, = MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); if (ptr =3D=3D MAP_FAILED) err(1, "cannot mmap"); end =3D ptr + len; printf("mmap %p pid %d\n", ptr, getpid()); printf("end %p len %#lx\n", end, len); fflush(stdout); pid =3D fork(); if (pid < 0) err(1, "cannot fork"); printf("pid %d\n", getpid()); sleep(pid =3D=3D 0 ? 1 : 15); printf("%d: [%ld] touch\n", getpid(), time(NULL)); p =3D ptr; while (p < end) { *p =3D 1; p +=3D pagesz; } printf("%d: [%ld] sleep before madvise\n", getpid(), = time(NULL)); sleep(pid =3D=3D 0 ? 1 : 5); printf("%d: [%ld] madvise\n", getpid(), time(NULL)); p =3D ptr; while (p < end) { int error; error =3D madvise(p, pagesz, MADV_DONTNEED); if (error) { err(1, "cannot madvise"); } p +=3D pagesz; } printf("%d: [%ld] Press enter to exit\n", getpid(), time(NULL)); getchar(); } --Apple-Mail=_3BEFE3ED-1F3C-4A6B-AAC6-021F01426D22--