The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them

Mark Millard markmi at dsl-only.net
Sun Apr 9 17:24:38 UTC 2017


On 2017-Apr-9, at 5:27 AM, Konstantin Belousov <kostikbel at gmail.com> wrote:

> On Sat, Apr 08, 2017 at 06:02:00PM -0700, Mark Millard wrote:
>> [I've identified the code path involved is the arm64 small allocations
>> turning into zeros for later fork-then-swapout-then-back-in,
>> specifically the ongoing RES(ident memory) size decrease that
>> "top -PCwaopid" shows before the fork/swap sequence. Hopefully
>> I've also exposed enough related information for someone that
>> knows what they are doing to get started with a specific
>> investigation, looking for a fix. I'd like for a pine64+
>> 2GB to have buildworld complete despite the forking and
>> swapping involved (yep: for a time zero RES(ident memory) for
>> some processes involved in the build).]
> 
> I was not able to follow the walls of text, but do not think that
> I pmap_ts_reference() is the real culprit there.
> 
> Is my impression right that the issue occurs on fork, and looks as
> a memory corruption, where some page suddently becomes zero-filled ?
> And swapping seems to be involved ?  It is somewhat interesting to see
> if the problem is reproducable on non-arm64 machines, e.g. armv7 or amd64.

Yes, yes, non-arm64 that I've tried works.

But I think that the following extra detail my be of use: what top
shows for RES over time is also odd on arm64 (only) and the amount
of pages that are zeroed is proportional to the decrease in RES.

In the test sequence:

A) Allocate lots of 14 KiByte allocations and initialize the content of each
to non-zero. The example ends up with RES of about 265M.

B) sleep some amount of time, I've been using well over 30 seconds here.

C) fork

D) sleep again (parent and child), also forcing swapping during the sleep
   (I used stress, manually run.)

E) Test the memory pattern in the parent and child process, passing over
   all the bytes, failed and good.

Both the parent and the child in (E) see the first pages allocated as zero,
with the number of pages being zero increasing as the sleep time in (B)
increases (as long as the sleep is over 30 sec or so). The parent and child
match for which pages are zero vs. not.

It fails with (B) being a no-op as well. But the proportionality with
the time for the sleep is interesting.

During (B) "top -PCwaopid" shows RES decreasing, starting after 30 sec
or so. The fork in (C) produces a child that does not have the same RES
as the parent but instead a tiny RES (80K as I remember). During (E)
the child's RES increases to full size.

My powerpc64, armv7, and amd64 tests of such do not fail, nor does RES
decrease during (B). The child process gets the same RES as the parent
as well, unlike for arm64.

In the failing context (arm64) RES in the parent decreases during (D)
before the swap-out as well.

> If answers to my two questions are yes, there is probably some bug with
> arm64 pmap handling of the dirty bit emulation.  ARMv8.0 does not provide
> hardware dirty bit, and pmap interprets an accessed writeable page as
> unconditionally dirty.  More, accessed bit is also not maintained by
> hardware, instead if should be set by pmap.  And arm64 pmap sets the
> AF bit unconditionally when creating valid pte.

fork-then-swap-out/in is required to see the problem. Neither fork
by itself nor swapping (zero RES as shown in top) by itself have
shown the problem so far.

> Hmm, could you try the following patch, I did not even compiled it.

I'll try it later today.

> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c
> index 3d5756ba891..55aa402eb1c 100644
> --- a/sys/arm64/arm64/pmap.c
> +++ b/sys/arm64/arm64/pmap.c
> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot)
> 		    sva += L3_SIZE) {
> 			l3 = pmap_load(l3p);
> 			if (pmap_l3_valid(l3)) {
> +				if ((l3 & ATTR_SW_MANAGED) &&
> +				    pmap_page_dirty(l3)) {
> +					vm_page_dirty(PHYS_TO_VM_PAGE(l3 &
> +					    ~ATTR_MASK));
> +				}
> 				pmap_set(l3p, ATTR_AP(ATTR_AP_RO));
> 				PTE_SYNC(l3p);
> 				/* XXX: Use pmap_invalidate_range */


===
Mark Millard
markmi at dsl-only.net



More information about the freebsd-hackers mailing list