ntpd problems since upgrading to 5.3

Tue Jan 18 15:33:08 PST 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday 18 January 2005 23:14, John wrote:
> On Tue, Jan 18, 2005 at 04:04:30PM -0600, John wrote:
> > On Tue, Jan 18, 2005 at 07:23:41AM -0600, John wrote:
> > > On Tue, Jan 18, 2005 at 07:26:16AM +0100, Christian Hiris wrote:
> > > > -----BEGIN PGP SIGNED MESSAGE-----
> > > > Hash: SHA1
> > > >
> > > > On Tuesday 18 January 2005 01:09, John wrote:
> > > > > This is what goes into the log:
> > > > > Jan 17 18:04:29 pearl ntpd[838]: ntpd 4.2.0-a Sun Jan  9 10:58:59
> > > > > CST 2005 (1) Jan 17 18:04:29 pearl ntpd[838]: bind() fd 7, family
> > > > > 2, port 123, addr 0.0.0.0,in_classd=0 flags=8 fails: Address
> > > > > already in use
> > > >
> > > > I can reproduce this, it only happens if you try start more than one
> > > > ntp-daemons on the same interfaces. Better start this via rc.
> > > >
> > > > # killall ntpd
> > > > # /etc/rc.d/ntpd start
> > > > Starting ntpd.
> > > > # /etc/rc.d/ntpd start
> > > > ntpd already running? (pid=68961).
> > > > # /etc/rc.d/ntpd stop
> > > > Stopping ntpd.
> > >
> > > Thank you, Christian, but I have confirmed that ntp is not running
> > > before the attempt that generates that message.
> > >
> > > # ps ax | grep ntp
> > > # killall ntpd
> > > No matching processes were found
> > > # ntpdc -c peers
> > > ntpdc: read: Connection refused
> > >
> > > So, I think we can be pretty sure at this point that ntpd is NOT
> > > running.  Then..
> > >
> > > I can't use the script to start ntp, because the config parameters
> > > are to not start it, so
> > >
> > > # ntpd
> > >
> > > Boom!  I immediately get the error message that I gave above!
> > >
> > > If it were already running, I could understand, but my point is that
> > > I've been pretty thorough in determining that it is my first attempt
> > > to run it that gets this error message.
> > >
> > > I have also tried running "ntpdate" before starting ntpd, or not
> > > doing it.  If I do it, it works correctly, indicating that ntpd
> > > is not running, becuase ntpdate will fail if ntpd is running.  I
> > > have also NOT run ntpdate first (after a reboot) just to prove
> > > to myself that there's nothing "residual" it could leave that would
> > > make ntpd complain about this.
> > >
> > > It's very puzzling!
> >
> > OK.  Get this.  I just generated a custom kernel to get rid of all
> > the good stuff that this laptop will never support.  It just so happens
> > to be a couple of days later (in CVS terms) than the one I was
> > running.  I decided to take a chance and just do the installkernel
> > rather than install the whole world.
> >
> > Now ntpd works.  I didn't change any config files, DNS, or anything
> > else - just installed my custom kernel.  I still get an error message,
> > but now it simply says "no IPv6 interfaces found" and runs successfully.
> >
> > Go figure.
> >
> > My best guess is that my prior cvsup of 5-STABLE had something in
> > the kernel environment and ntpd slightly out of sync, with ntpd
> > being ahead of the kernel, and now, even though I didn't do an
> > installworld, that skew was resolved.
> >
> > While rare, it is the possibility of this skew that makes me
> > uncomfortable with cvsup - but having no better plans, I'll keep using
> > it!
> >
> > I may have to figure out how to maintain a "local release" tree that
> > is behind the -STABLE tree, or something.  I truly do not know what
> > the right answer is.
>
> Wow!  Now my mind is REALLY blown!
>
> Look at the following consecutive runs of ntpdc just a few minutes
> part, with nothing else going on in between:
>
> pearl# !!
> ntpdc -c peers
>      remote           local      st poll reach  delay   offset    disp
> =======================================================================
> =dexter.starfire 192.168.1.53     3  256   17 0.00026  0.023755 0.93869
> =dauntless.starf 192.168.1.53     4  256   17 0.00053  0.016804 0.93942
> pearl# pwd
> /home/john
> pearl# !nt
> ntpdc -c peers
>      remote           local      st poll reach  delay   offset    disp
> =======================================================================
> =dexter.starfire 192.168.1.53     3   64    1 0.00026  0.035822 7.93750
> =dauntless.starf 192.168.1.53     4   64    1 0.00061  0.035934 7.93750
> pearl# ps ax | grep ntp
>   751  ??  Ss     0:00.05 ntpd
> pearl#
>
> That last line is me confirming that it's still the same PID for
> ntpd.  What happened here?  The reachability mask went from 17 to
> 1, the dispersion popped WAY up, the offset increased, and the
> polling time went down.  Maybe this is normal for ntpd in some set
> of circumstance, but I've not seen it before.
>
> The other odd thing, and I haven't shown you enough runs to
> demonstrate it, is that the offset was INCREASING prior to
> this apparent "reset."  Maybe it failed to converge and started
> over?  But the polling interval kept increasing...
>
> Anybody know what just happened?

To me, this behaviour seems to be normal. 

http://www.eecis.udel.edu/~mills/database/papers/trans.pdf
http://www.eecis.udel.edu/~mills/database/papers/allan.pdf

Cheers,
ch

- -- 
Christian Hiris <4711 at chello.at> | OpenPGP KeyID 0x3BCA53BE 
OpenPGP-Key at hkp://wwwkeys.eu.pgp.net and http://pgp.mit.edu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (FreeBSD)

iD8DBQFB7Zyx09WjGjvKU74RAqpMAJ9y96fpHI54YXenn7LfK4tfA3A1DwCffcRa
Q3wMz2dY3QQNjMEdrCmclNA=
=eT5Y
-----END PGP SIGNATURE-----