projects/clang380-import -r294962+ based powerpc (32-bit) buildworld -j 6: make gets SEGV, a partial smoking gun?

Mark Millard markmi at dsl-only.net
Fri Feb 5 07:46:54 UTC 2016


The problem:

For a clang 3.8.0 based buildworld TARGET_ARCH=powerpc installation attempting "make -j 6 buildworld"  (run on 4 powerpc cores) eventually gets a segmentation fault in a  make instance. (More details later.) "make buildworld" does not fault.

I expect that the details that I describe below implies some form of intermittency, such as a race condition.

(This is with the content of sys/powerpc/powerpc/sigcode32.S -r295186 in place so that signal delivery maintains the modulo 16 byte stack/frame alignment status instead of changing the alignment.)

(clang 3.8.0 targeting powerpc (32-bit) is known to be able to introduce more stack alignment dependencies by sometimes "or-ing" in offset-bits into some aligned-address lower bits instead of using addition. But I do not know if that is involved here somehow.)



What is always involved and what varies:

In all cases the failure was r31 being used as a frame-pointer with the value zero in r31 at the time of the address calculation, even when when the dereference of the address was later. r1 still seemed to be a valid stack pointer in all cases.

In every case the faulting routine had called one or more routines during its operation --and those had returned. There was an example or two of a self-contained routine that was recursive that got the failure.

In some cases prior calls in the faulting routine had non-zero r31 values when they returned. (There was later r31 usage that did not fault.)

Overall the call chains varied widely for various example faults, although some call context is more common as a failure point.

Use of ktrace with "-di -t cs" and use of kdump for extracting for the failing process shows the same 5 line sequence before every example "PSIG SIGSEGV". What was before those 5 lines varied across the various kdsump outputs.

I used ktrace/kdump commands of the structure:

ktrace -di -f /usr/obj/make.out -t cs -p ???
kdump -E -f /usr/obj/make.out -p ??? > /var/tmp/make_ktrace_sigsegv_??.txt

Example results (showing the 5 lines and PSIG SIGSEGV):

(3 prior "sigreturn JUSTRETURN" among what is not shown)
>  65158 make     0.205791 PSIG  SIGCHLD caught handler=0x180aae0 mask=0x0 code=CLD_EXITED
>  65158 make     0.205822 CALL  write(0x3,0x189e914,0x1)
>  65158 make     0.205847 RET   write 1
>  65158 make     0.205869 CALL  sigreturn(0xffffbb50)
>  65158 make     0.205923 RET   sigreturn JUSTRETURN
>  65158 make     0.205962 PSIG  SIGSEGV SIG_DFL code=SEGV_MAPERR

(365 prior "sigreturn JUSTRETURN" among what is not shown)
>    599 make     5.552305 PSIG  SIGCHLD caught handler=0x180aae0 mask=0x0 code=CLD_EXITED
>    599 make     5.552323 CALL  write(0x3,0x189e914,0x1)
>    599 make     5.552337 RET   write 1
>    599 make     5.552347 CALL  sigreturn(0xffffbb30)
>    599 make     5.552358 RET   sigreturn JUSTRETURN
>    599 make     5.552381 PSIG  SIGSEGV SIG_DFL code=SEGV_MAPERR

(287 prior "sigreturn JUSTRETURN" among what is not shown)
>  75728 make     4.141097 PSIG  SIGCHLD caught handler=0x180aae0 mask=0x0 code=CLD_EXITED
>  75728 make     4.141116 CALL  write(0x3,0x189e914,0x1)
>  75728 make     4.141154 RET   write 1
>  75728 make     4.141349 CALL  sigreturn(0xffffbaa0)
>  75728 make     4.141366 RET   sigreturn JUSTRETURN
>  75728 make     4.141404 PSIG  SIGSEGV SIG_DFL code=SEGV_MAPERR

(273 prior "sigreturn JUSTRETURN" among what is not shown)
>  12195 make     27.213277 PSIG  SIGCHLD caught handler=0x180aae0 mask=0x0 code=CLD_EXITED
>  12195 make     27.213322 CALL  write(0x3,0x189e914,0x1)
>  12195 make     27.213346 RET   write 1
>  12195 make     27.213361 CALL  sigreturn(0xffffb1e0)
>  12195 make     27.213383 RET   sigreturn JUSTRETURN
>  12195 make     27.213418 PSIG  SIGSEGV SIG_DFL code=SEGV_MAPERR

(789 prior "sigreturn JUSTRETURN" among what is not shown)
>  50545 make     80.255162 PSIG  SIGCHLD caught handler=0x180aae0 mask=0x0 code=CLD_EXITED
>  50545 make     80.255192 CALL  write(0x3,0x189e914,0x1)
>  50545 make     80.255219 RET   write 1
>  50545 make     80.255241 CALL  sigreturn(0xffffafa0)
>  50545 make     80.255265 RET   sigreturn JUSTRETURN
>  50545 make     80.255317 PSIG  SIGSEGV SIG_DFL code=SEGV_MAPERR

The 5 line sequence is not sufficient for the problem to occur but appears to be necessary: There were sometimes hundreds of prior "PSIG SIGCHLD". . ."RET sigreturn JUSTRETURNS" sequences for which they were not followed by "PSIG SIGSEGV". But every failure tested with ktrace has the 5 lines as an immediate prefix in the list for the process.

Which instance of make varied and where in make the fault happens varied. The "-E" elapsed times above and those JUSTRETURN counts give a solid clue to there being variability in when the fault happens.

I'll use some script log file sizes for the buidlworld as another indication of variability. I've sorted them:

2942664
3304207
3342660
3474585
3941983

so spanning from 2.9 MBytes to 3.9 MBytes. I've since gotten a few with less and some with more.


Note: A couple of times with ktrace being involved it failed at an earlier stage than I've seen otherwise. It may be that ktrace being involved makes the problem more likely/frequent.



Context basics (quad core PowerMac running TARGET_ARCH=powerpc (32-bit)):

# freebsd-version -ku; uname -aKU
11.0-CURRENT
11.0-CURRENT
FreeBSD FBSDG4C1 11.0-CURRENT FreeBSD 11.0-CURRENT #2 r294962M: Mon Feb  1 00:31:03 PST 2016     markmi at FreeBSDx64:/usr/obj/clang_gcc421/powerpc.powerpc/usr/src/sys/GENERICvtsc-NODEBUG  powerpc 1100097 1100097

This is with the content of sys/powerpc/powerpc/sigcode32.S -r295186 in place so that signal delivery maintains the modulo 16 byte stack/frame alignment status instead of changing the alignment.

buildkernel was via gcc 4.2.1
buildworld was via clang 3.8.0



I'm not sure that I'm going to get much farther in tracking down the source of the race(?) that leads to the SEGV's.


===
Mark Millard
markmi at dsl-only.net



More information about the freebsd-ppc mailing list