Segmentation fault running ntpd
Ian Lepore
ian at freebsd.org
Sun Jul 19 16:24:16 UTC 2015
On Sat, 2015-07-18 at 05:09 -0700, David Wolfskill wrote:
> Lousy timing (no pun intended -- it's early in the day for me),
> given the recent MFC, but as I was booting my laptop to yesterday's
> head:
>
> FreeBSD g1-245.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #127 r285652M/285652:1100077: Fri Jul 17 04:30:16 PDT 2015 root at g1-245.catwhisker.org:/common/S3/obj/usr/src/sys/CANARY amd64
>
> to build today's head (@r285670; still in progress as I type), I
> happened to note [Oh, great -- we can no longer copy/paste from
> console now??!? Fine, I'll transcribe by hand.... :-(]:
>
> ...
> bound to 172.17.1.245 -- renewal in 43200 seconds.
> pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
> Starting Network: lo0 em0 iwn0 lagg0.
> ...
>
> Trying to examine the /ntpd.core, I see:
> root at g1-245:/ # gdb `which ntpd` ntpd.core
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
> Core was generated by `ntpd'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
> Loaded symbols for /lib/libm.so.5
> Reading symbols from /lib/libcrypto.so.7...(no debugging symbols found)...done.
> Loaded symbols for /lib/libcrypto.so.7
> Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
> Loaded symbols for /lib/libthr.so.3
> Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
> Loaded symbols for /lib/libc.so.7
> Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
> Loaded symbols for /libexec/ld-elf.so.1
> #0 0x00000008011cd6a0 in sbrk () from /lib/libc.so.7
> [New Thread 801c07400 (LWP 100122/<unknown>)]
> [New Thread 801c06400 (LWP 100120/<unknown>)]
> (gdb) bt
> #0 0x00000008011cd6a0 in sbrk () from /lib/libc.so.7
> #1 0x00000008ccbd4f34 in ?? ()
> #2 0x0000000000000005 in ?? ()
> #3 0x0000000801800448 in ?? ()
> #4 0x00000008011ca888 in sbrk () from /lib/libc.so.7
> #5 0x00000008018000c8 in ?? ()
> #6 0x00000008018000c0 in ?? ()
> #7 0x0000000000000208 in ?? ()
> #8 0x0000000801c32fb0 in ?? ()
> #9 0x0000000000000001 in ?? ()
> #10 0x0000000801cc20c8 in ?? ()
> #11 0x0000000000000030 in ?? ()
> #12 0x0000000801cc20c8 in ?? ()
> #13 0x00007fffffffe480 in ?? ()
> #14 0x00000008011cd240 in sbrk () from /lib/libc.so.7
> #15 0x0000000000000280 in ?? ()
> #16 0x00000008014bbc70 in malloc_message () from /lib/libc.so.7
> #17 0x00000008018000c0 in ?? ()
> #18 0x0000000801800448 in ?? ()
> #19 0x0000000000000032 in ?? ()
> #20 0x0000000801800458 in ?? ()
> #21 0x00000008014bbc68 in malloc_message () from /lib/libc.so.7
> #22 0x0000000801cc2000 in ?? ()
> ---Type <return> to continue, or q <return> to quit---
> #23 0x00000008014bba60 in malloc_message () from /lib/libc.so.7
> #24 0x0000000801cc20d8 in ?? ()
> #25 0x00000000000000a0 in ?? ()
> #26 0x0000000000000208 in ?? ()
> #27 0x00007fffffffe4d0 in ?? ()
> #28 0x00000008011bdd7a in _malloc_thread_cleanup () from /lib/libc.so.7
> Previous frame inner to this frame (corrupt stack?)
> (gdb)
>
> which seems... well, not especially useful, as far as I can tell.
>
>
> This is (as mentioned above) on my laptop; as such, it is expected to
> "wander" from one network to another. Accordingly:
>
> * Since it could be connected to a network I do not control, I use a
> packet filter (IPFW, in my case) to reduce my exposure from a
> possibly-hostile network.
>
> * Rather than enabling ntpd in /etc/rc.conf, I use
> /etc/dhclient-exit-hooks to start ntpd after the laptop has a DHCP
> lease. (For networks I control, I also set up the DHCP server to
> advertise what NTP server the DHCP clients should use, but the code in
> dhclient-exit-hooks merely prefers that, rather han requiring it.)
>
> * In my world-view -- at least for networks I control -- DNS zone files
> are the Source of Truth with respect to hostname <-> IP address
> correspondence, and Dynamic DNS is Evil. I populate my zone files
> with appropriate A & PTR records so that every assignable DHCP
> address has a PTR record, and the hostname to which it points has
> an A record that points back to that IP address. Accordingly, I
> also use /etc/dhclient-exit-hooks so the laptop can find out what
> its hostname is, and set it accordingly.
>
> Mind, I've been doing the above for well over a decade, so that doesn't
> qualify as "new."
>
> And most of the time, it Just Works (which is a significant reason I
> keep doing it).
>
> A couple of other things that are more recent, and possibly of
> relevance:
>
> * As alluded to above, I have the em0 & wlan0 (iwn(4)) NICs set up using
> Link Aggregation in "failover" mode. In practice, I rarely use
> the em0 (wired) NIC -- I had originally done that based on a
> misperception of how I thought things were set up at work, and
> then just left the configuration alone and relied on the wireless
> NIC. (At home, I have things set up so that the failover would
> work, but doing so would be a little awkward for reasons that
> aren't relevant here.)
>
> * I have the laptop configured to run xdm(1)... after the DHCP lease is
> acquired and the hostname is set. My ~/.xsession script is set
> up so it fires up ssh-agent, requests a passphrase, and then
> (among other things) establishes an SSH session to the "mail hub"
> at home and re-establish a tmux session where I'm running mutt
> to handle my email. I've noticed that in head, these connections
> sometimes fail to get initialized, and sometimes will time out,
> while sessions started a few minutes later will have no problem.
> That seems peculiar, but was sufficiently ... well, "nebulous" that
> I didn't think it warranted a whine of its own here. But on the
> chance that it's related to ntpd giving up the ghost prematurely,
> it seemed but a reasonable exercise of "Full Disclosure" to mention
> it in this context -- even though it's also something I've been doing
> since the (late) 1990s.
>
> So: Any suggestions for either diagnosing what the root cause is or
> changing the configuration so that the failure no longer occurs?
>
> Thanks!
>
> Peace,
> david
Was there anything (at all) in /var/log/messages about ntpd? Even the
routine messages (such as what interfaces it binds to) might give a bit
of a clue about how far it got in its init before it died.
-- Ian
More information about the freebsd-current
mailing list