[Bug 204048] stable/9: r289998: ntpd 4.2.8p4 DNS resolution misbehaves (occasional segfault)
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Mon Oct 26 23:06:44 UTC 2015
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204048
Bug ID: 204048
Summary: stable/9: r289998: ntpd 4.2.8p4 DNS resolution
misbehaves (occasional segfault)
Product: Base System
Version: 9.3-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: bin
Assignee: freebsd-bugs at FreeBSD.org
Reporter: jdc at koitsu.org
Recent upgrade of ntpd 4.2.8p4 to stable/9 results in a daemon which behaves
very very oddly. Said upgrade: http://www.freshbsd.org/commit/freebsd/r289998
My log after several manual troubleshooting attempts -- note the intermixed
segfaults:
Oct 26 15:38:05 icarus ntpd[1092]: giving up resolving host clock.isc.org:
servname not supported for ai_socktype (9)
Oct 26 15:38:23 icarus ntpd[1116]: giving up resolving host clock.isc.org:
servname not supported for ai_socktype (9)
pid 1139 (ntpd), uid 0: exited on signal 11 (core dumped)
Oct 26 15:39:07 icarus ntpd[1176]: giving up resolving host clock.isc.org:
servname not supported for ai_socktype (9)
Oct 26 15:39:59 icarus ntpd[1209]: giving up resolving host ntp-1.cso.uiuc.edu:
servname not supported for ai_socktype (9)
Oct 26 15:40:24 icarus ntpd[1268]: giving up resolving host clock.isc.org:
servname not supported for ai_socktype (9)
pid 1294 (ntpd), uid 0: exited on signal 11 (core dumped)
pid 1312 (ntpd), uid 0: exited on signal 11 (core dumped)
Oct 26 15:44:09 icarus ntpd[1409]: giving up resolving host clock.isc.org:
servname not supported for ai_socktype (9)
Oct 26 15:45:26 icarus ntpd[1490]: giving up resolving host
0.freebsd.pool.ntp.org: servname not supported for ai_socktype (9)
Oct 26 15:50:18 icarus ntpd[1656]: giving up resolving host tick.jrc.us:
servname not supported for ai_socktype (9)
Segfaults are always here:
root at icarus:~ # gdb /usr/sbin/ntpd /ntpd.core
...
#0 0x000000080114d79d in _malloc_postfork () from /lib/libc.so.7
[New Thread 801807c00 (LWP 100797/ntpd)]
[New Thread 801807400 (LWP 100791/ntpd)]
(gdb) bt
#0 0x000000080114d79d in _malloc_postfork () from /lib/libc.so.7
#1 0x000000080114fb3e in _malloc_postfork () from /lib/libc.so.7
#2 0x00000008011523fe in _malloc_prefork () from /lib/libc.so.7
#3 0x0000000801154482 in calloc () from /lib/libc.so.7
#4 0x000000080117aba6 in __res_state () from /lib/libc.so.7
#5 0x000000080118698c in freeaddrinfo () from /lib/libc.so.7
#6 0x00000008011ab61a in nsdispatch () from /lib/libc.so.7
#7 0x0000000801187ffb in getaddrinfo () from /lib/libc.so.7
#8 0x0000000000474f04 in blocking_getaddrinfo ()
#9 0x0000000000473a43 in blocking_child_common ()
#10 0x00000000004737e9 in blocking_thread ()
#11 0x0000000800afee70 in pthread_getprio () from /lib/libthr.so.3
#12 0x0000000000000000 in ?? ()
Important:
The behaviour seen is very strange. Basically, the daemon starts, emits one of
the aforementioned DNS errors, then proceeds to either a) exit, b) crash, or c)
continue running.
Sometimes when the daemon exits (possibly when crashing too), it restarts
itself. There have been a couple times where ps -auxwww | grep ntp returns
nothing, yet a few seconds later the daemon is found running.
Things I've tried which made no difference:
1. Removing -4 from $ntpd_flags (I set this because while my system has IPv6, I
prefer using IPv4 everywhere)
2. Using /etc/ntp.conf (r289998) instead of my own ntp.conf
There is no workaround for this other than to roll back to something prior to
r289998.
Googling turns up several reports of this problem, but all relate to people
trying to use chroot'ing with ntpd (I DO NOT use this feature).
https://mail-index.netbsd.org/current-users/2014/01/26/msg024169.html
https://mail-index.netbsd.org/current-users/2014/06/01/msg024998.html
One report says that use of -O1 (on ARM) relieves the problem, but crashing is
seen on VAX and other platforms. (My system uses gcc, not clang, just for the
record)
Footnote: upgrading to stable/10 is not an option until the load average bug
there is rectified (I am not the only one to report this problem). I can try
to test out this ntpd on a VM running stable/10 to see if the problem there is
reproducible or not.
My ntp.conf (w/ comments removed):
server clock.isc.org iburst
server ntp-1.cso.uiuc.edu iburst
server clock.psu.edu iburst
server tick.jrc.us iburst
server 0.us.pool.ntp.org iburst
restrict default limited kod nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict 192.168.1.0 mask 255.255.255.0
My rc.conf ntp-related flags:
# ntpd_flags: temporary workaround for
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199127
ntpd_enable="yes"
ntpd_config="/conf/ME/ntp.conf"
ntpd_sync_on_start="yes"
ntpd_flags="-4 ${ntpd_flags}"
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list