repeated segfault of daemon after ~245 minutes of OS uptime, or multiples thereof (12.0Rp1 amd64)
dch at skunkwerks.at
Thu Jan 10 19:31:20 UTC 2019
Does this rather unusual duration remind somebody of some periodic counter cycle? I'm stumped. AFAICT the failure is periodic based on the host OS boot time, and *not* the runtime, but I'm not 100% sure on that yet.
Per subject, round about 245 minutes after host boot, and repeatedly after that at the same interval (+- a couple of minutes), a jailed erlang runtime (databases/couchdb2) segfaults, on multiple systems. All are low end 8 core non-HT x86_64 arch atom CPUs C2750 @ with 8GB RAM, and are well within normal limits for cpu, ram, disk io.
I've looked at 245 minutes in hex & binary, as minutes, seconds, milli- and micro-, and none of these resemble some sort of nibble-aligned counter that might conceivably overflow.
All ports are built via a custom poudriere stack, albeit not very far off standard ports - we need a handful of custom settings and packages. URL below has details.
sysctls are quite a few but are largely identical to those I run elsewhere, admittedly on larger boxes.
networking is internet facing BGP and a private fc00/7 IPv6 vpn for the cluster nodes to communicate.
lldb stacktraces and further notes at https://hackmd.io/elgRy4IWSR-FhViJDXHbDA
Also, my lldb skills are limited, if somebody can recommend a resource to bone up that would be awesome.
More information about the freebsd-questions