Help needed to identify golang fork / memory corruption issue on FreeBSD
Steven Hartland
killing at multiplay.co.uk
Tue Dec 6 20:34:37 UTC 2016
On 06/12/2016 14:35, Konstantin Belousov wrote:
> On Tue, Dec 06, 2016 at 01:53:52PM +0000, Steven Hartland wrote:
>> On 06/12/2016 12:59, Konstantin Belousov wrote:
>>> On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote:
>>>> Hi guys I'm trying to help identify / fix an issue with golang where by
>>>> fork results in memory corruption.
>>>>
>>>> Details of the issue can be found here:
>>>> https://github.com/golang/go/issues/15658
>>>>
>>>> In summary when a fork is done in golang is has a chance of causing
>>>> memory corruption in the parent resulting in a process crash once detected.
>>>>
>>>> Its believed that this only effects FreeBSD.
>>>>
>>>> This has similarities to other reported issues such as this one which
>>>> impacted perl during 10.x:
>>>> https://rt.perl.org/Public/Bug/Display.html?id=122199
>>> I cannot judge about any similarilities when all the description provided
>>> is 'memory corruption'. BTW, the perl issue described, where child segfaults
>>> after the fork, is more likely to be caused by the set of problems referenced
>>> in the FreeBSD-EN-16:17.vm.
>>>
>>>> And more recently the issue with nginx on 11.x:
>>>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html
>>> Which does not affect anything unless aio is used on Sandy/Ivy.
>>>
>>>> Its possible, some believe likely, that this is a kernel bug around fork
>>>> / vm that golang stresses, but I've not been able to confirm.
>>>>
>>>> I can reproduce the issue at will, takes between 5mins and 1hour using
>>>> 16 threads, and it definitely seems like an interaction between fork and
>>>> other memory operations.
>>> Which arch is the kernel and the process which demonstrates the behaviour ?
>>> I mean i386/amd64.
>> amd64
> How large is the machine, how many cores, what is the physical memory size ?
>
>>>> I've tried reproducing the issue in C but also no joy (captured in the bug).
>>>>
>>>> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix
>>>> (#306350).
>>> Switch to HEAD kernel, for start.
>>> Show the memory map of the failed process.
No sign of zeroed memory that I can tell.
This error was caused by hitting the following validation in gc:
func (list *mSpanList) remove(span *mspan) {
if span.prev == nil || span.list != list {
println("runtime: failed MSpanList_Remove", span,
span.prev, span.list, list)
throw("MSpanList_Remove")
}
runtime: failed MSpanList_Remove 0x80052e580 0x80052e300 0x53e9c0 0x53e9b0
fatal error: MSpanList_Remove
(gdb) print list
$4 = (runtime.mSpanList *) 0x53e9b0 <runtime.mheap_+4944>
(gdb) print span.list
$5 = (runtime.mSpanList *) 0x53e9c0 <runtime.mheap_+4960>
(gdb) print span.prev
$6 = (struct runtime.mspan **) 0x80052e300
(gdb) print *list
$7 = {first = 0x80052e580, last = 0x8008aa180}
(gdb) print *span.list
$8 = {first = 0x8007ea7e0, last = 0x80052e580}
procstat -v test.core.1481054183
PID START END PRT RES PRES REF SHD FLAG
TP PATH
1178 0x400000 0x49b000 r-x 115 223 3 1 CN-- vn
/root/test
1178 0x49b000 0x528000 r-- 97 223 3 1 CN-- vn
/root/test
1178 0x528000 0x539000 rw- 10 0 1 0 C--- vn
/root/test
1178 0x539000 0x55a000 rw- 16 16 1 0 C--- df
1178 0x800528000 0x800a28000 rw- 118 118 1 0 C--- df
1178 0x800a28000 0x800a68000 rw- 1 1 1 0 CN-- df
1178 0x800a68000 0x800aa8000 rw- 2 2 1 0 CN-- df
1178 0x800aa8000 0x800c08000 rw- 50 50 1 0 CN-- df
1178 0x800c08000 0x800c48000 rw- 2 2 1 0 CN-- df
1178 0x800c48000 0x800c88000 rw- 1 1 1 0 CN-- df
1178 0x800c88000 0x800cc8000 rw- 1 1 1 0 CN-- df
1178 0xc000000000 0xc000001000 rw- 1 1 1 0 CN-- df
1178 0xc41ffe0000 0xc41ffe8000 rw- 8 8 1 0 CN-- df
1178 0xc41ffe8000 0xc41fff0000 rw- 8 8 1 0 CN-- df
1178 0xc41fff0000 0xc41fff8000 rw- 8 8 1 0 C--- df
1178 0xc41fff8000 0xc420300000 rw- 553 553 1 0 C--- df
1178 0xc420300000 0xc420400000 rw- 234 234 1 0 C--- df
1178 0x7ffffffdf000 0x7ffffffff000 rwx 2 2 1 0 C--D df
1178 0x7ffffffff000 0x800000000000 r-x 1 1 33 0 ---- ph
This is from FreeBSD 12.0-CURRENT #36 r309618M
ktrace on 11.0-RELEASE is still running 6 hours so far.
Regards
Steve
More information about the freebsd-hackers
mailing list