ntpd problems since upgrading to 5.3

Tue Jan 18 14:14:34 PST 2005

On Tue, Jan 18, 2005 at 04:04:30PM -0600, John wrote:
> On Tue, Jan 18, 2005 at 07:23:41AM -0600, John wrote:
> > On Tue, Jan 18, 2005 at 07:26:16AM +0100, Christian Hiris wrote:
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > > 
> > > On Tuesday 18 January 2005 01:09, John wrote:
> > > 
> > > > This is what goes into the log:
> > > > Jan 17 18:04:29 pearl ntpd[838]: ntpd 4.2.0-a Sun Jan  9 10:58:59 CST 2005
> > > > (1) Jan 17 18:04:29 pearl ntpd[838]: bind() fd 7, family 2, port 123, addr
> > > > 0.0.0.0,in_classd=0 flags=8 fails: Address already in use
> > > 
> > > I can reproduce this, it only happens if you try start more than one 
> > > ntp-daemons on the same interfaces. Better start this via rc.
> > > 
> > > # killall ntpd
> > > # /etc/rc.d/ntpd start
> > > Starting ntpd.
> > > # /etc/rc.d/ntpd start
> > > ntpd already running? (pid=68961).
> > > # /etc/rc.d/ntpd stop
> > > Stopping ntpd.
> > 
> > Thank you, Christian, but I have confirmed that ntp is not running
> > before the attempt that generates that message.
> > 
> > # ps ax | grep ntp
> > # killall ntpd
> > No matching processes were found
> > # ntpdc -c peers
> > ntpdc: read: Connection refused
> > 
> > So, I think we can be pretty sure at this point that ntpd is NOT
> > running.  Then..
> > 
> > I can't use the script to start ntp, because the config parameters
> > are to not start it, so
> > 
> > # ntpd
> > 
> > Boom!  I immediately get the error message that I gave above!
> > 
> > If it were already running, I could understand, but my point is that
> > I've been pretty thorough in determining that it is my first attempt
> > to run it that gets this error message. 
> > 
> > I have also tried running "ntpdate" before starting ntpd, or not
> > doing it.  If I do it, it works correctly, indicating that ntpd
> > is not running, becuase ntpdate will fail if ntpd is running.  I
> > have also NOT run ntpdate first (after a reboot) just to prove
> > to myself that there's nothing "residual" it could leave that would
> > make ntpd complain about this.
> > 
> > It's very puzzling!
> 
> OK.  Get this.  I just generated a custom kernel to get rid of all
> the good stuff that this laptop will never support.  It just so happens
> to be a couple of days later (in CVS terms) than the one I was
> running.  I decided to take a chance and just do the installkernel
> rather than install the whole world.
> 
> Now ntpd works.  I didn't change any config files, DNS, or anything
> else - just installed my custom kernel.  I still get an error message,
> but now it simply says "no IPv6 interfaces found" and runs successfully.
> 
> Go figure.
> 
> My best guess is that my prior cvsup of 5-STABLE had something in
> the kernel environment and ntpd slightly out of sync, with ntpd
> being ahead of the kernel, and now, even though I didn't do an
> installworld, that skew was resolved.
> 
> While rare, it is the possibility of this skew that makes me uncomfortable with cvsup - but having no better plans, I'll keep using it!
> 
> I may have to figure out how to maintain a "local release" tree that
> is behind the -STABLE tree, or something.  I truly do not know what
> the right answer is.

Wow!  Now my mind is REALLY blown!

Look at the following consecutive runs of ntpdc just a few minutes
part, with nothing else going on in between:

pearl# !!
ntpdc -c peers
     remote           local      st poll reach  delay   offset    disp
=======================================================================
=dexter.starfire 192.168.1.53     3  256   17 0.00026  0.023755 0.93869
=dauntless.starf 192.168.1.53     4  256   17 0.00053  0.016804 0.93942
pearl# pwd
/home/john
pearl# !nt
ntpdc -c peers
     remote           local      st poll reach  delay   offset    disp
=======================================================================
=dexter.starfire 192.168.1.53     3   64    1 0.00026  0.035822 7.93750
=dauntless.starf 192.168.1.53     4   64    1 0.00061  0.035934 7.93750
pearl# ps ax | grep ntp
  751  ??  Ss     0:00.05 ntpd
pearl#

That last line is me confirming that it's still the same PID for
ntpd.  What happened here?  The reachability mask went from 17 to
1, the dispersion popped WAY up, the offset increased, and the
polling time went down.  Maybe this is normal for ntpd in some set
of circumstance, but I've not seen it before.

The other odd thing, and I haven't shown you enough runs to
demonstrate it, is that the offset was INCREASING prior to
this apparent "reset."  Maybe it failed to converge and started
over?  But the polling interval kept increasing...

Anybody know what just happened?
-- 

John Lind
john at starfire.MN.ORG