Help needed to identify golang fork / memory corruption issue on FreeBSD
Steven Hartland
killing at multiplay.co.uk
Mon Mar 27 23:16:40 UTC 2017
On 27/03/2017 17:49, Konstantin Belousov wrote:
> On Mon, Mar 27, 2017 at 05:33:49PM +0100, Steven Hartland wrote:
>> On 27/03/2017 17:18, Konstantin Belousov wrote:
>>> On Mon, Mar 27, 2017 at 12:47:11PM +0100, Steven Hartland wrote:
>>>> OK now the similar but unrelated issue with signal stacks is solved I've
>>>> moved back to the initial issue.
>>>>
>>>> I've made some progress with a reproduction case as detailed here:
>>>> https://github.com/golang/go/issues/15658#issuecomment-288747812
>>>>
>>>> In short it seems that having a running child, while the parent runs GC,
>>>> is some how responsible for memory corruption in the parent.
>>>>
>>>> The reason I believe this is if I run the same GC in the parent after
>>>> the child exits instead of while its running, I've been unable to
>>>> reproduce the issue.
>>>>
>>>> As the memory segments are COW then the issue might be in VM subsystem.
>>> Well, it might be, but it is a strange corruption mode to believe.
>> Indeed, but would you agree the evidence seems to indicate that this may
>> be the case, as otherwise I would have expected that running the GC
>> after the child process has exited would have zero impact on the issue.
>>>> In order to confirm / deny this I was wondering if there was a way to
>>>> force a full copy of all segments for the child instead of using the COW
>>>> optimisation.
>>> No, there is no. By design, copying only occurs on faults, when VM
>>> detects that the map entry needs copying. Doing the actual copy at fork
>>> time would require writing a lot of new code.
>> I noticed in vm_map_copy_entry the following:
>> /*
>> * We don't want to make writeable wired pages
>> copy-on-write.
>> * Immediately copy these pages into the new map by
>> simulating
>> * page faults. The new pages are pageable.
>> */
>> vm_fault_copy_entry(dst_map, src_map, dst_entry, src_entry,
>> fork_charge);
>>
>> I wondered if I could use vm_fault_copy_entry to force the copy on fork?
> No, the vm_fault_copy_entry() only works with wired entries, e.g. it cannot
> page in not yet touched page, and the result is also wired.
>
>>> Does go have FreeBSD/i386 port ? If yes, is the issue reproducable there ?
>> Yes it does, I don't currently have i386 machine to test with, I'm
>> assuming testing i386 on amd64 kernel, would likely not have any effect.
> Only if the bug is in kernel and not in the go runtime. I am still not
> convinced that the kernel is the culprit.
>
>>> Another blind experiment to try is to comment out call to
>>> vm_object_collapse() in sys/vm/vm_map.c:vm_map_copy_entry() and see if
>>> it changes anything.
>> I'll do that shortly.
Still crashed with vm_object_collapse commented out, here's the parent
procstat -v:
PID START END PRT RES PRES REF SHD FLAG
TP PATH
69713 0x400000 0x70e000 r-x 306 601 3 1 CN-- vn
/root/golang/src/test5/test5
69713 0x70e000 0x951000 r-- 263 601 3 1 CN-- vn
/root/golang/src/test5/test5
69713 0x951000 0x988000 rw- 31 0 1 0 C--- vn
/root/golang/src/test5/test5
69713 0x988000 0x9ab000 rw- 18 18 1 0 C--- df
69713 0x800951000 0x800b51000 rw- 41 41 1 0 C--- df
69713 0x800b51000 0x800c21000 rw- 27 27 1 0 C--- df
69713 0x800c21000 0x800c31000 rw- 16 16 1 0 C--- df
69713 0x800c31000 0x800c71000 rw- 1 1 1 0 C--- df
69713 0x800c71000 0x800cf1000 rw- 5 5 1 0 C--- df
69713 0x800cf1000 0x800d31000 rw- 1 1 1 0 CN-- df
69713 0x800d31000 0x800d71000 rw- 1 1 1 0 C--- df
69713 0x800d71000 0x800e31000 rw- 3 3 1 0 C--- df
69713 0x800e31000 0x800eb1000 rw- 3 3 1 0 C--- df
69713 0x800eb1000 0x800ef1000 rw- 2 2 1 0 C--- df
69713 0xc000000000 0xc000001000 rw- 1 1 1 0 CN-- df
69713 0xc41fff0000 0xc41fff8000 rw- 3 3 1 0 CN-- df
69713 0xc41fff8000 0xc420200000 rw- 267 267 1 0 C--- df
69713 0x7ffffffdf000 0x7ffffffff000 rwx 2 2 1 0 C--D df
69713 0x7ffffffff000 0x800000000000 r-x 1 1 27 0 ---- ph
Regards
Steve
More information about the freebsd-hackers
mailing list