Needle in the haystack: same Java code on two identical machines, one passes one fails

Osipov, Michael michael.osipov at siemens.com
Wed Mar 15 14:57:43 UTC 2017


Hi folks,

I am currently experiencing a stdio issue on a FreeBSD 10.3-STABLE box at work where
another, identical box, works flawlessly as well as other test boxes in a VM or on real
hardware from 9.3-STABLE to 11.0-STABLE.

Let's stick to the two identical boxes at work for now, both are two identical HPE
servers (Xeon CPUs, RAM 4 GiB) running 10.3-STABLE, both base systems are
configured the same way. Software from ports installed is slightly different.

faulty box:
FreeBSD blnn719x.ww004.siemens.net 10.3-STABLE FreeBSD 10.3-STABLE #0 r310805: Fri Dec 30 11:29:53 CET 2016     root at blnn719x.ww004.siemens.net:/usr/obj/usr/src/sys/BLNN719X  i386

working box:
FreeBSD blnn714x.ww004.siemens.net 10.3-STABLE FreeBSD 10.3-STABLE #0 r310632: Tue Dec 27 18:58:32 CET 2016     root at blnn714x.ww004.siemens.net:/usr/obj/usr/src/sys/BLNN714X  i386

The code I run is publically available, Maven Surefire (branch 2.19.2-experimental
contains extended log output) which is the testing framework used throughout
the entire Maven ecosystem:
https://git-wip-us.apache.org/repos/asf/maven-surefire.git

Run with:
mvn -B -V clean install -Drat.skip -Dcheckstyle.skip | tee ~/maven-surefire.log

Both machines have Maven 3.5.0-alpha-1 and OpenJDK 8 Update 121 from ports.
Tests run off local disks on a gvinum backend which runs on top of an HP RAID5
system.

A few specific tests fail namely where a parent Java process forks another Java
process (not thread) and communicates bidrectionally through stdio. The failures
manifest in the parent process assuming the forked process to be gone although
the forked process already notified via stdio that it has completed all tasks
and is performing a clean exit. This does not happen on blnn714x, really weird!

I don't expect anyone to solve my problem here, but merely provide pointers
where I can start looking what is really wrong with the machine blnn719x compared
to the other one because I am searching the needle in the haystack.
I am also quite certain that this is not a bug in the client code because it
should fail on both machines as well as on the VMs I have at home and
especially my old Pentium 4 box running FreeBSD 11-STABLE with even less memory
if it would be a client code issue.
I am convinced that this is some shared memory, buffers, caches issue.

I have uploaded two tarballs for both machines:
http://home.apache.org/~michaelo/maven/surefire/maven-surefire-2.19.2-experimental-blnn714x.tar.gz
http://home.apache.org/~michaelo/maven/surefire/maven-surefire-2.19.2-experimental-blnn719x.tar.gz


Each tarball contains:

* log output of Maven with the failed tests, see the very end, e.g.,
Tests in error: 
  testForkCountTwoNoReuse(org.apache.maven.surefire.its.ForkModeIT): Exit code was non-zero: 1; command line and log = (..)
* surefire-integration-tests/target/<testName>/log.txt
  Contains verbose traces of the communication between parent and children

I'd appreciate any type of help!

Best regards,

Michael

PS: I have tested the code also on Windows, Ubuntu, RHEL6, Fedora 25 successfully


More information about the freebsd-java mailing list