Help needed to identify golang fork / memory corruption issue on FreeBSD
Konstantin Belousov
kostikbel at gmail.com
Fri Mar 17 12:44:47 UTC 2017
On Fri, Mar 17, 2017 at 11:27:52AM +0000, Steven Hartland wrote:
> On 17/03/2017 08:23, Konstantin Belousov wrote:
> > On Fri, Mar 17, 2017 at 06:30:49AM +0000, Steven Hartland wrote:
> >> Ok I think I've identified the cause.
> >>
> >> If an alternative signal stack is applied to a non-main thread and that
> >> thread calls execve then the signal stack is not cleared.
> >>
> >> This results in all sorts of badness.
> >>
> >> Full details, including a small C reproduction case can be found here:
> >> https://github.com/golang/go/issues/15658#issuecomment-287276856
> >>
> >> So looks like its kernel bug. If anyone has an ideas about that before I
> >> look tomorrow that would be appreciated.
> > Yes, there is definitely a kernel bug, which should be fixed by the patch
> > below.
> >
> > Still, what I saw when I looked at the issue, is not quite resembling
> > potential consequences of the bug. Using wrong memory for signal stack
> > would result either in much more significant memory corruption if the
> > alt stack range is mapped and used for something unrelated, or in killed
> > process on signal delivery, if the range is not mapped. While I saw a
> > systematic 'off by 0x10' in some gc structures.
> >
> > Anyway, patch for the issue you identified:
> >
> > diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c
> > index 29d5dd4b132..9bf3ba66f5c 100644
> > --- a/sys/kern/kern_sig.c
> > +++ b/sys/kern/kern_sig.c
> > @@ -976,7 +976,6 @@ execsigs(struct proc *p)
> > * and are now ignored by default).
> > */
> > PROC_LOCK_ASSERT(p, MA_OWNED);
> > - td = FIRST_THREAD_IN_PROC(p);
> > ps = p->p_sigacts;
> > mtx_lock(&ps->ps_mtx);
> > while (SIGNOTEMPTY(ps->ps_sigcatch)) {
> > @@ -1007,6 +1006,8 @@ execsigs(struct proc *p)
> > * Reset stack state to the user stack.
> > * Clear set of signals caught on the signal stack.
> > */
> > + td = curthread;
> > + MPASS(td->td_proc == p);
> > td->td_sigstk.ss_flags = SS_DISABLE;
> > td->td_sigstk.ss_size = 0;
> > td->td_sigstk.ss_sp = 0;
> Thanks Kostik, pretty obvious now looking at :)
>
> Testing here we've seen all sorts of corruption looking things, mainly
> around random signals from SIGILL to SIGSEGV but also random kernel
> messages including:
> pid 4603 (test): sigreturn copying xfpustate failed
> pid 5013 (test): sigreturn xfpusave_len = 0x44d9bb
>
> I'm currently running a test, but its looking good as the test case
> usually crashes in a matter of seconds.
>
> Would you mind if I committed it?
I am capable of committing the patches.
>
> I'm guessing given its nature this is something we'd want MFC'ed and
> Errata's issued for all supported versions?
MFC will be done for sure. I am not so sure about EN, this is a routine
bugfix. For some reasons 10.3 errata might be indeed the only way to get
this for 10.x users, but I do not see why bother re/so with 11.0.
More information about the freebsd-hackers
mailing list