svn commit: r302252 - head/sys/kern

Konstantin Belousov kostikbel at gmail.com
Fri Jul 1 14:25:31 UTC 2016


On Fri, Jul 01, 2016 at 08:39:48PM +1000, Bruce Evans wrote:
> It seems simple and clean enough, but is too much during a re freeze.
> 
> I will only make some minor comments about style.
Well, it is not only about style.  If you have no more comments, I will
ask for testing.  The patch is about fixing bugs, although in somewhat
extended scope, so I think it is still fine as the things do not explode.
I added the stats to the patch, it is not that intrusive actually.

I still do not see why/do not want to use spinlock for the tc_windup()
exclusion.  Patch is at the end of the message.

> 
> > diff --git a/sys/compat/linprocfs/linprocfs.c b/sys/compat/linprocfs/linprocfs.c
> > index 56b2ade..a0dce47 100644
> > --- a/sys/compat/linprocfs/linprocfs.c
> > +++ b/sys/compat/linprocfs/linprocfs.c
> > @@ -447,9 +447,11 @@ linprocfs_dostat(PFS_FILL_ARGS)
> > 	struct pcpu *pcpu;
> > 	long cp_time[CPUSTATES];
> > 	long *cp;
> > +	struct timeval boottime;
> > 	int i;
> >
> > 	read_cpu_time(cp_time);
> > +	getboottime(&boottime);
> 
> This is used surprisingly often by too many subsystems.  With the value still
> broken so that locking it doesn't help much, I would leave it as a global.
I prefer to keep the KPI consistent.

> > +	.th_generation = 0,
> > +	.th_next = &th0
> > +};
> 
> This shouldn't spell out all the 0 initializers.  That was only needed to
> reach the non-0 initializer at the end of the initializer.
I thought about it, but initially did not liked the implicit initialization.
Changed.

> > static int
> > sysctl_kern_boottime(SYSCTL_HANDLER_ARGS)
> > {
> > +	struct bintime boottimebin;
> > +	struct timeval boottime;
> > +
> > +	binuptime1(NULL, &boottimebin);
> > +	bintime2timeval(&boottimebin, &boottime);
> 
> Use the wrapper function getboottime() if you keep it.
Yes,  I wrote this before I added the helper.

> So maybe use a new general function that returns boottimebin for the
> non-uptime functions only.  Possibly it can add the offset directly.
I changed getbintime() and getboottimebin() to use the fenced magic and
fetch boottime inside the loop. This removed the need for binuptime1().

> 
> > @@ -1116,8 +1170,10 @@ sysclock_snap2bintime(struct sysclock_snap *cs, struct bintime *bt,
> > 		if (cs->delta > 0)
> > 			bintime_addx(bt, cs->fb_info.th_scale * cs->delta);
> >
> > -		if ((flags & FBCLOCK_UPTIME) == 0)
> > +		if ((flags & FBCLOCK_UPTIME) == 0) {
> > +			binuptime1(NULL, &boottimebin);
> 
> Perhaps use a more direct way to get boottimebin().  binuptime1() is
> pessimized by null pointer checks for both its args.  Use the new
> function getboottimebin() even if is not direct.
Used it, and now it is a direct interface into timehands as well.

> 
> > ...
> > diff --git a/sys/net/bpf.c b/sys/net/bpf.c
> > index 3b12cf4..4251f71 100644
> > --- a/sys/net/bpf.c
> > +++ b/sys/net/bpf.c
> > @@ -2328,12 +2328,13 @@ bpf_hdrlen(struct bpf_d *d)
> > static void
> > bpf_bintime2ts(struct bintime *bt, struct bpf_ts *ts, int tstype)
> > {
> > -	struct bintime bt2;
> > +	struct bintime bt2, boottimebin;
> > 	struct timeval tsm;
> > 	struct timespec tsn;
> >
> > 	if ((tstype & BPF_T_MONOTONIC) == 0) {
> > 		bt2 = *bt;
> > +		getboottimebin(&boottimebin);
> > 		bintime_add(&bt2, &boottimebin);
> > 		bt = &bt2;
> > 	}
> 
> This is still too chummy with the (mis)implementation.  I think it is to
> convert a relative CLOCK_MONOTONIC time to an absolute CLOCK_REALTIME
> time.  That should be done in the time subsystem.  It is unclear if
> we care about strict real time with leap seconds adjustments.
> 
> BPF_T_MONOTONIC is not documented in any man page or used in any library
> or applicatiion in /usr/src.  BPF_T_BINTIME is documented bpf(4) but not
> used in any library or application in /usr/src.  Similarly for all of
> BPF_T_*.  contrib/libpcap has 47 lines matching timeval, 2 matching
> timespec and 0 matching bintime.
I will leave this to bpf maintainers.

> 
> > diff --git a/sys/netpfil/ipfw/ip_fw_sockopt.c b/sys/netpfil/ipfw/ip_fw_sockopt.c
> > index d186ba5..7faf99f 100644
> > --- a/sys/netpfil/ipfw/ip_fw_sockopt.c
> > +++ b/sys/netpfil/ipfw/ip_fw_sockopt.c
> > @@ -395,6 +395,7 @@ swap_map(struct ip_fw_chain *chain, struct ip_fw **new_map, int new_len)
> > static void
> > export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr)
> > {
> > +	struct timeval boottime;
> >
> > 	cntr->size = sizeof(*cntr);
> >
> > @@ -403,21 +404,26 @@ export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr)
> > 		cntr->bcnt = counter_u64_fetch(krule->cntr + 1);
> > 		cntr->timestamp = krule->timestamp;
> > 	}
> > -	if (cntr->timestamp > 0)
> > +	if (cntr->timestamp > 0) {
> > +		getboottime(&boottime);
> > 		cntr->timestamp += boottime.tv_sec;
> > +	}
> > }
> 
> Looks like another home made conversion from CLOCK_MONOTONIC to
> CLOCK_REALTIME.  Easier to understand since it is actually used.
> ipfw seems to use only getmicrouptime() for timestamping.  There are
> good reasons for using monotonic time for everything except presenting
> the time to the user and the conversion for that should be more careful
> than the above.
I leave this one to net- people as well.

> > diff --git a/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c b/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c
> > index 1d07943..0879299 100644
> > --- a/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c
> > +++ b/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c
> > @@ -504,11 +504,13 @@ svc_rpc_gss_find_client(struct svc_rpc_gss_clientid *id)
> > {
> > 	struct svc_rpc_gss_client *client;
> > 	struct svc_rpc_gss_client_list *list;
> > +	struct timeval boottime;
> > 	unsigned long hostid;
> >
> > 	rpc_gss_log_debug("in svc_rpc_gss_find_client(%d)", id->ci_id);
> >
> > 	getcredhostid(curthread->td_ucred, &hostid);
> > +	getboottime(&boottime);
> > 	if (id->ci_hostid != hostid || id->ci_boottime != boottime.tv_sec)
> > 		return (NULL);
> 
> Here it is hopefully just a magic id, with the user being a remote system.
> Any time that doesn't go backwards or forwards so far that it is in the
> lieftime of an old or new boot instance works well for identifying the
> boot instance.
It does not work for leap seconds in the same way as is does not work after
setclock().  So I just leave this conversion as is.

 sys/compat/linprocfs/linprocfs.c    |   4 +
 sys/fs/devfs/devfs_vnops.c          |   4 +-
 sys/fs/fdescfs/fdesc_vnops.c        |   2 +
 sys/fs/nfs/nfsport.h                |   2 +-
 sys/fs/procfs/procfs_status.c       |   2 +
 sys/kern/kern_acct.c                |   2 +-
 sys/kern/kern_ntptime.c             |  55 +++++-------
 sys/kern/kern_proc.c                |   2 +
 sys/kern/kern_tc.c                  | 161 +++++++++++++++++++++++++-----------
 sys/kern/kern_time.c                |   8 +-
 sys/kern/subr_rtc.c                 |   2 +-
 sys/kern/sys_procdesc.c             |   3 +-
 sys/net/bpf.c                       |   3 +-
 sys/netpfil/ipfw/ip_fw_sockopt.c    |  12 ++-
 sys/nfs/nfs_lock.c                  |   2 +
 sys/rpc/rpcsec_gss/svc_rpcsec_gss.c |   4 +
 sys/sys/time.h                      |   5 +-
 17 files changed, 174 insertions(+), 99 deletions(-)

diff --git a/sys/compat/linprocfs/linprocfs.c b/sys/compat/linprocfs/linprocfs.c
index 56b2ade..a0dce47 100644
--- a/sys/compat/linprocfs/linprocfs.c
+++ b/sys/compat/linprocfs/linprocfs.c
@@ -447,9 +447,11 @@ linprocfs_dostat(PFS_FILL_ARGS)
 	struct pcpu *pcpu;
 	long cp_time[CPUSTATES];
 	long *cp;
+	struct timeval boottime;
 	int i;
 
 	read_cpu_time(cp_time);
+	getboottime(&boottime);
 	sbuf_printf(sb, "cpu %ld %ld %ld %ld\n",
 	    T2J(cp_time[CP_USER]),
 	    T2J(cp_time[CP_NICE]),
@@ -624,10 +626,12 @@ static int
 linprocfs_doprocstat(PFS_FILL_ARGS)
 {
 	struct kinfo_proc kp;
+	struct timeval boottime;
 	char state;
 	static int ratelimit = 0;
 	vm_offset_t startcode, startdata;
 
+	getboottime(&boottime);
 	sx_slock(&proctree_lock);
 	PROC_LOCK(p);
 	fill_kinfo_proc(p, &kp);
diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c
index 7cc0f9e..afa3da4 100644
--- a/sys/fs/devfs/devfs_vnops.c
+++ b/sys/fs/devfs/devfs_vnops.c
@@ -707,10 +707,11 @@ devfs_getattr(struct vop_getattr_args *ap)
 {
 	struct vnode *vp = ap->a_vp;
 	struct vattr *vap = ap->a_vap;
-	int error;
 	struct devfs_dirent *de;
 	struct devfs_mount *dmp;
 	struct cdev *dev;
+	struct timeval boottime;
+	int error;
 
 	error = devfs_populate_vp(vp);
 	if (error != 0)
@@ -740,6 +741,7 @@ devfs_getattr(struct vop_getattr_args *ap)
 	vap->va_blocksize = DEV_BSIZE;
 	vap->va_type = vp->v_type;
 
+	getboottime(&boottime);
 #define fix(aa)							\
 	do {							\
 		if ((aa).tv_sec <= 3600) {			\
diff --git a/sys/fs/fdescfs/fdesc_vnops.c b/sys/fs/fdescfs/fdesc_vnops.c
index 4f6e1b9..65b8a54 100644
--- a/sys/fs/fdescfs/fdesc_vnops.c
+++ b/sys/fs/fdescfs/fdesc_vnops.c
@@ -394,7 +394,9 @@ fdesc_getattr(struct vop_getattr_args *ap)
 {
 	struct vnode *vp = ap->a_vp;
 	struct vattr *vap = ap->a_vap;
+	struct timeval boottime;
 
+	getboottime(&boottime);
 	vap->va_mode = S_IRUSR|S_IXUSR|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH;
 	vap->va_fileid = VTOFDESC(vp)->fd_ix;
 	vap->va_uid = 0;
diff --git a/sys/fs/nfs/nfsport.h b/sys/fs/nfs/nfsport.h
index 921df2d..6b41e2f 100644
--- a/sys/fs/nfs/nfsport.h
+++ b/sys/fs/nfs/nfsport.h
@@ -872,7 +872,7 @@ int newnfs_realign(struct mbuf **, int);
 /*
  * Set boottime.
  */
-#define	NFSSETBOOTTIME(b)	((b) = boottime)
+#define	NFSSETBOOTTIME(b)	(getboottime(&b))
 
 /*
  * The size of directory blocks in the buffer cache.
diff --git a/sys/fs/procfs/procfs_status.c b/sys/fs/procfs/procfs_status.c
index 5a00ee1..defdec3 100644
--- a/sys/fs/procfs/procfs_status.c
+++ b/sys/fs/procfs/procfs_status.c
@@ -70,6 +70,7 @@ procfs_doprocstatus(PFS_FILL_ARGS)
 	const char *wmesg;
 	char *pc;
 	char *sep;
+	struct timeval boottime;
 	int pid, ppid, pgid, sid;
 	int i;
 
@@ -129,6 +130,7 @@ procfs_doprocstatus(PFS_FILL_ARGS)
 		calcru(p, &ut, &st);
 		PROC_STATUNLOCK(p);
 		start = p->p_stats->p_start;
+		getboottime(&boottime);
 		timevaladd(&start, &boottime);
 		sbuf_printf(sb, " %jd,%ld %jd,%ld %jd,%ld",
 		    (intmax_t)start.tv_sec, start.tv_usec,
diff --git a/sys/kern/kern_acct.c b/sys/kern/kern_acct.c
index ef3fd2e..46e6d9b 100644
--- a/sys/kern/kern_acct.c
+++ b/sys/kern/kern_acct.c
@@ -389,7 +389,7 @@ acct_process(struct thread *td)
 	acct.ac_stime = encode_timeval(st);
 
 	/* (4) The elapsed time the command ran (and its starting time) */
-	tmp = boottime;
+	getboottime(&tmp);
 	timevaladd(&tmp, &p->p_stats->p_start);
 	acct.ac_btime = tmp.tv_sec;
 	microuptime(&tmp);
diff --git a/sys/kern/kern_ntptime.c b/sys/kern/kern_ntptime.c
index d352ee7..efc3713 100644
--- a/sys/kern/kern_ntptime.c
+++ b/sys/kern/kern_ntptime.c
@@ -162,29 +162,12 @@ static l_fp time_adj;			/* tick adjust (ns/s) */
 
 static int64_t time_adjtime;		/* correction from adjtime(2) (usec) */
 
-static struct mtx ntpadj_lock;
-MTX_SYSINIT(ntpadj, &ntpadj_lock, "ntpadj",
-#ifdef PPS_SYNC
-    MTX_SPIN
-#else
-    MTX_DEF
-#endif
-);
+static struct mtx ntp_lock;
+MTX_SYSINIT(ntp, &ntp_lock, "ntp", MTX_SPIN);
 
-/*
- * When PPS_SYNC is defined, hardpps() function is provided which can
- * be legitimately called from interrupt filters.  Due to this, use
- * spinlock for ntptime state protection, otherwise sleepable mutex is
- * adequate.
- */
-#ifdef PPS_SYNC
-#define	NTPADJ_LOCK()		mtx_lock_spin(&ntpadj_lock)
-#define	NTPADJ_UNLOCK()		mtx_unlock_spin(&ntpadj_lock)
-#else
-#define	NTPADJ_LOCK()		mtx_lock(&ntpadj_lock)
-#define	NTPADJ_UNLOCK()		mtx_unlock(&ntpadj_lock)
-#endif
-#define	NTPADJ_ASSERT_LOCKED()	mtx_assert(&ntpadj_lock, MA_OWNED)
+#define	NTP_LOCK()		mtx_lock_spin(&ntp_lock)
+#define	NTP_UNLOCK()		mtx_unlock_spin(&ntp_lock)
+#define	NTP_ASSERT_LOCKED()	mtx_assert(&ntp_lock, MA_OWNED)
 
 #ifdef PPS_SYNC
 /*
@@ -271,7 +254,7 @@ ntp_gettime1(struct ntptimeval *ntvp)
 {
 	struct timespec atv;	/* nanosecond time */
 
-	NTPADJ_ASSERT_LOCKED();
+	NTP_ASSERT_LOCKED();
 
 	nanotime(&atv);
 	ntvp->time.tv_sec = atv.tv_sec;
@@ -302,9 +285,9 @@ sys_ntp_gettime(struct thread *td, struct ntp_gettime_args *uap)
 {	
 	struct ntptimeval ntv;
 
-	NTPADJ_LOCK();
+	NTP_LOCK();
 	ntp_gettime1(&ntv);
-	NTPADJ_UNLOCK();
+	NTP_UNLOCK();
 
 	td->td_retval[0] = ntv.time_state;
 	return (copyout(&ntv, uap->ntvp, sizeof(ntv)));
@@ -315,9 +298,9 @@ ntp_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct ntptimeval ntv;	/* temporary structure */
 
-	NTPADJ_LOCK();
+	NTP_LOCK();
 	ntp_gettime1(&ntv);
-	NTPADJ_UNLOCK();
+	NTP_UNLOCK();
 
 	return (sysctl_handle_opaque(oidp, &ntv, sizeof(ntv), req));
 }
@@ -382,7 +365,7 @@ sys_ntp_adjtime(struct thread *td, struct ntp_adjtime_args *uap)
 		error = priv_check(td, PRIV_NTP_ADJTIME);
 	if (error != 0)
 		return (error);
-	NTPADJ_LOCK();
+	NTP_LOCK();
 	if (modes & MOD_MAXERROR)
 		time_maxerror = ntv.maxerror;
 	if (modes & MOD_ESTERROR)
@@ -484,7 +467,7 @@ sys_ntp_adjtime(struct thread *td, struct ntp_adjtime_args *uap)
 	ntv.stbcnt = pps_stbcnt;
 #endif /* PPS_SYNC */
 	retval = ntp_is_time_error(time_status) ? TIME_ERROR : time_state;
-	NTPADJ_UNLOCK();
+	NTP_UNLOCK();
 
 	error = copyout((caddr_t)&ntv, (caddr_t)uap->tp, sizeof(ntv));
 	if (error == 0)
@@ -506,6 +489,8 @@ ntp_update_second(int64_t *adjustment, time_t *newsec)
 	int tickrate;
 	l_fp ftemp;		/* 32/64-bit temporary */
 
+	NTP_LOCK();
+
 	/*
 	 * On rollover of the second both the nanosecond and microsecond
 	 * clocks are updated and the state machine cranked as
@@ -627,6 +612,8 @@ ntp_update_second(int64_t *adjustment, time_t *newsec)
 	else
 		time_status &= ~STA_PPSSIGNAL;
 #endif /* PPS_SYNC */
+
+	NTP_UNLOCK();
 }
 
 /*
@@ -690,7 +677,7 @@ hardupdate(offset)
 	long mtemp;
 	l_fp ftemp;
 
-	NTPADJ_ASSERT_LOCKED();
+	NTP_ASSERT_LOCKED();
 
 	/*
 	 * Select how the phase is to be controlled and from which
@@ -772,7 +759,7 @@ hardpps(tsp, nsec)
 	long u_sec, u_nsec, v_nsec; /* temps */
 	l_fp ftemp;
 
-	NTPADJ_LOCK();
+	NTP_LOCK();
 
 	/*
 	 * The signal is first processed by a range gate and frequency
@@ -956,7 +943,7 @@ hardpps(tsp, nsec)
 		time_freq = pps_freq;
 
 out:
-	NTPADJ_UNLOCK();
+	NTP_UNLOCK();
 }
 #endif /* PPS_SYNC */
 
@@ -999,11 +986,11 @@ kern_adjtime(struct thread *td, struct timeval *delta, struct timeval *olddelta)
 			return (error);
 		ltw = (int64_t)delta->tv_sec * 1000000 + delta->tv_usec;
 	}
-	NTPADJ_LOCK();
+	NTP_LOCK();
 	ltr = time_adjtime;
 	if (delta != NULL)
 		time_adjtime = ltw;
-	NTPADJ_UNLOCK();
+	NTP_UNLOCK();
 	if (olddelta != NULL) {
 		atv.tv_sec = ltr / 1000000;
 		atv.tv_usec = ltr % 1000000;
diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c
index 2f1f620..892b23a 100644
--- a/sys/kern/kern_proc.c
+++ b/sys/kern/kern_proc.c
@@ -872,6 +872,7 @@ fill_kinfo_proc_only(struct proc *p, struct kinfo_proc *kp)
 	struct session *sp;
 	struct ucred *cred;
 	struct sigacts *ps;
+	struct timeval boottime;
 
 	/* For proc_realparent. */
 	sx_assert(&proctree_lock, SX_LOCKED);
@@ -953,6 +954,7 @@ fill_kinfo_proc_only(struct proc *p, struct kinfo_proc *kp)
 	kp->ki_nice = p->p_nice;
 	kp->ki_fibnum = p->p_fibnum;
 	kp->ki_start = p->p_stats->p_start;
+	getboottime(&boottime);
 	timevaladd(&kp->ki_start, &boottime);
 	PROC_STATLOCK(p);
 	rufetch(p, &kp->ki_rusage);
diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 0f015b3..c9676fc 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
@@ -70,31 +70,22 @@ struct timehands {
 	struct bintime		th_offset;
 	struct timeval		th_microtime;
 	struct timespec		th_nanotime;
+	struct bintime		th_boottime;
 	/* Fields not to be copied in tc_windup start with th_generation. */
 	u_int			th_generation;
 	struct timehands	*th_next;
 };
 
 static struct timehands th0;
-static struct timehands th9 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th0};
-static struct timehands th8 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th9};
-static struct timehands th7 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th8};
-static struct timehands th6 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th7};
-static struct timehands th5 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th6};
-static struct timehands th4 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th5};
-static struct timehands th3 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th4};
-static struct timehands th2 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th3};
-static struct timehands th1 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th2};
+static struct timehands th1 = {
+	.th_next = &th0
+};
 static struct timehands th0 = {
-	&dummy_timecounter,
-	0,
-	(uint64_t)-1 / 1000000,
-	0,
-	{1, 0},
-	{0, 0},
-	{0, 0},
-	1,
-	&th1
+	.th_counter = &dummy_timecounter,
+	.th_scale = (uint64_t)-1 / 1000000,
+	.th_offset = {1, 0},
+	.th_generation = 1,
+	.th_next = &th1
 };
 
 static struct timehands *volatile timehands = &th0;
@@ -106,8 +97,6 @@ int tc_min_ticktock_freq = 1;
 volatile time_t time_second = 1;
 volatile time_t time_uptime = 1;
 
-struct bintime boottimebin;
-struct timeval boottime;
 static int sysctl_kern_boottime(SYSCTL_HANDLER_ARGS);
 SYSCTL_PROC(_kern, KERN_BOOTTIME, boottime, CTLTYPE_STRUCT|CTLFLAG_RD,
     NULL, 0, sysctl_kern_boottime, "S,timeval", "System boottime");
@@ -135,7 +124,8 @@ SYSCTL_PROC(_kern_timecounter, OID_AUTO, alloweddeviation,
 
 static int tc_chosen;	/* Non-zero if a specific tc was chosen via sysctl. */
 
-static void tc_windup(void);
+static void tc_windup(bool top_call, struct bintime *new_boottimebin);
+static void tc_windup_locked(struct bintime *new_boottimebin);
 static void cpu_tick_calibrate(int);
 
 void dtrace_getnanotime(struct timespec *tsp);
@@ -143,6 +133,10 @@ void dtrace_getnanotime(struct timespec *tsp);
 static int
 sysctl_kern_boottime(SYSCTL_HANDLER_ARGS)
 {
+	struct timeval boottime;
+
+	getboottime(&boottime);
+
 #ifndef __mips__
 #ifdef SCTL_MASK32
 	int tv[2];
@@ -150,11 +144,11 @@ sysctl_kern_boottime(SYSCTL_HANDLER_ARGS)
 	if (req->flags & SCTL_MASK32) {
 		tv[0] = boottime.tv_sec;
 		tv[1] = boottime.tv_usec;
-		return SYSCTL_OUT(req, tv, sizeof(tv));
-	} else
+		return (SYSCTL_OUT(req, tv, sizeof(tv)));
+	}
 #endif
 #endif
-		return SYSCTL_OUT(req, &boottime, sizeof(boottime));
+	return (SYSCTL_OUT(req, &boottime, sizeof(boottime)));
 }
 
 static int
@@ -164,7 +158,7 @@ sysctl_kern_timecounter_get(SYSCTL_HANDLER_ARGS)
 	struct timecounter *tc = arg1;
 
 	ncount = tc->tc_get_timecount(tc);
-	return sysctl_handle_int(oidp, &ncount, 0, req);
+	return (sysctl_handle_int(oidp, &ncount, 0, req));
 }
 
 static int
@@ -174,7 +168,7 @@ sysctl_kern_timecounter_freq(SYSCTL_HANDLER_ARGS)
 	struct timecounter *tc = arg1;
 
 	freq = tc->tc_frequency;
-	return sysctl_handle_64(oidp, &freq, 0, req);
+	return (sysctl_handle_64(oidp, &freq, 0, req));
 }
 
 /*
@@ -198,7 +192,7 @@ tc_delta(struct timehands *th)
  */
 
 #ifdef FFCLOCK
-void
+static void
 fbclock_binuptime(struct bintime *bt)
 {
 	struct timehands *th;
@@ -234,8 +228,18 @@ fbclock_microuptime(struct timeval *tvp)
 void
 fbclock_bintime(struct bintime *bt)
 {
+	struct bintime boottimebin;
+	struct timehands *th;
+	unsigned int gen;
 
-	fbclock_binuptime(bt);
+	do {
+		th = timehands;
+		gen = atomic_load_acq_int(&th->th_generation);
+		*bt = th->th_offset;
+		bintime_addx(bt, th->th_scale * tc_delta(th));
+		boottimebin = th->th_boottime;
+		atomic_thread_fence_acq();
+	} while (gen == 0 || gen != th->th_generation);
 	bintime_add(bt, &boottimebin);
 }
 
@@ -303,12 +307,14 @@ void
 fbclock_getbintime(struct bintime *bt)
 {
 	struct timehands *th;
+	struct bintime boottimebin;
 	unsigned int gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
 		*bt = th->th_offset;
+		boottimebin = th->th_boottime;
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 	bintime_add(bt, &boottimebin);
@@ -378,8 +384,18 @@ microuptime(struct timeval *tvp)
 void
 bintime(struct bintime *bt)
 {
+	struct bintime boottimebin;
+	struct timehands *th;
+	u_int gen;
 
-	binuptime(bt);
+	do {
+		th = timehands;
+		gen = atomic_load_acq_int(&th->th_generation);
+		*bt = th->th_offset;
+		bintime_addx(bt, th->th_scale * tc_delta(th));
+		boottimebin = th->th_boottime;
+		atomic_thread_fence_acq();
+	} while (gen == 0 || gen != th->th_generation);
 	bintime_add(bt, &boottimebin);
 }
 
@@ -447,12 +463,14 @@ void
 getbintime(struct bintime *bt)
 {
 	struct timehands *th;
+	struct bintime boottimebin;
 	u_int gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
 		*bt = th->th_offset;
+		boottimebin = th->th_boottime;
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 	bintime_add(bt, &boottimebin);
@@ -487,6 +505,29 @@ getmicrotime(struct timeval *tvp)
 }
 #endif /* FFCLOCK */
 
+void
+getboottime(struct timeval *boottime)
+{
+	struct bintime boottimebin;
+
+	getboottimebin(&boottimebin);
+	bintime2timeval(&boottimebin, boottime);
+}
+
+void
+getboottimebin(struct bintime *boottimebin)
+{
+	struct timehands *th;
+	u_int gen;
+
+	do {
+		th = timehands;
+		gen = atomic_load_acq_int(&th->th_generation);
+		*boottimebin = th->th_boottime;
+		atomic_thread_fence_acq();
+	} while (gen == 0 || gen != th->th_generation);
+}
+
 #ifdef FFCLOCK
 /*
  * Support for feed-forward synchronization algorithms. This is heavily inspired
@@ -1103,6 +1144,7 @@ int
 sysclock_snap2bintime(struct sysclock_snap *cs, struct bintime *bt,
     int whichclock, uint32_t flags)
 {
+	struct bintime boottimebin;
 #ifdef FFCLOCK
 	struct bintime bt2;
 	uint64_t period;
@@ -1116,8 +1158,10 @@ sysclock_snap2bintime(struct sysclock_snap *cs, struct bintime *bt,
 		if (cs->delta > 0)
 			bintime_addx(bt, cs->fb_info.th_scale * cs->delta);
 
-		if ((flags & FBCLOCK_UPTIME) == 0)
+		if ((flags & FBCLOCK_UPTIME) == 0) {
+			getboottimebin(&boottimebin);
 			bintime_add(bt, &boottimebin);
+		}
 		break;
 #ifdef FFCLOCK
 	case SYSCLOCK_FFWD:
@@ -1226,10 +1270,12 @@ tc_getfrequency(void)
 	return (timehands->th_counter->tc_frequency);
 }
 
+static struct mtx tc_setclock_mtx;
+MTX_SYSINIT(tc_setclock_init, &tc_setclock_mtx, "tcsetc", MTX_DEF);
+
 /*
  * Step our concept of UTC.  This is done by modifying our estimate of
  * when we booted.
- * XXX: not locked.
  */
 void
 tc_setclock(struct timespec *ts)
@@ -1237,26 +1283,43 @@ tc_setclock(struct timespec *ts)
 	struct timespec tbef, taft;
 	struct bintime bt, bt2;
 
-	cpu_tick_calibrate(1);
-	nanotime(&tbef);
 	timespec2bintime(ts, &bt);
+	nanotime(&tbef);
+	mtx_lock(&tc_setclock_mtx);
+	critical_enter();
+	cpu_tick_calibrate(1);
 	binuptime(&bt2);
 	bintime_sub(&bt, &bt2);
-	bintime_add(&bt2, &boottimebin);
-	boottimebin = bt;
-	bintime2timeval(&bt, &boottime);
 
 	/* XXX fiddle all the little crinkly bits around the fiords... */
-	tc_windup();
-	nanotime(&taft);
+	tc_windup(true, &bt);
+	critical_exit();
+	mtx_unlock(&tc_setclock_mtx);
 	if (timestepwarnings) {
+		nanotime(&taft);
 		log(LOG_INFO,
 		    "Time stepped from %jd.%09ld to %jd.%09ld (%jd.%09ld)\n",
 		    (intmax_t)tbef.tv_sec, tbef.tv_nsec,
 		    (intmax_t)taft.tv_sec, taft.tv_nsec,
 		    (intmax_t)ts->tv_sec, ts->tv_nsec);
 	}
-	cpu_tick_calibrate(1);
+}
+
+static volatile int tc_windup_lock;
+static void
+tc_windup(bool top_call, struct bintime *new_boottimebin)
+{
+
+	for (;;) {
+		if (atomic_cmpset_int(&tc_windup_lock, 0, 1)) {
+			atomic_thread_fence_acq();
+			tc_windup_locked(new_boottimebin);
+			atomic_store_rel_int(&tc_windup_lock, 0);
+			break;
+		} else if (!top_call) {
+			break;
+		}
+	}
 }
 
 /*
@@ -1265,7 +1328,7 @@ tc_setclock(struct timespec *ts)
  * timecounter and/or do seconds processing in NTP.  Slightly magic.
  */
 static void
-tc_windup(void)
+tc_windup_locked(struct bintime *new_boottimebin)
 {
 	struct bintime bt;
 	struct timehands *th, *tho;
@@ -1289,6 +1352,8 @@ tc_windup(void)
 	th->th_generation = 0;
 	atomic_thread_fence_rel();
 	bcopy(tho, th, offsetof(struct timehands, th_generation));
+	if (new_boottimebin != NULL)
+		th->th_boottime = *new_boottimebin;
 
 	/*
 	 * Capture a timecounter delta on the current timecounter and if
@@ -1338,7 +1403,7 @@ tc_windup(void)
 	 * case we missed a leap second.
 	 */
 	bt = th->th_offset;
-	bintime_add(&bt, &boottimebin);
+	bintime_add(&bt, &th->th_boottime);
 	i = bt.sec - tho->th_microtime.tv_sec;
 	if (i > LARGE_STEP)
 		i = 2;
@@ -1346,7 +1411,7 @@ tc_windup(void)
 		t = bt.sec;
 		ntp_update_second(&th->th_adjustment, &bt.sec);
 		if (bt.sec != t)
-			boottimebin.sec += bt.sec - t;
+			th->th_boottime.sec += bt.sec - t;
 	}
 	/* Update the UTC timestamps used by the get*() functions. */
 	/* XXX shouldn't do this here.  Should force non-`get' versions. */
@@ -1769,7 +1834,7 @@ pps_event(struct pps_state *pps, int event)
 	tcount &= pps->capth->th_counter->tc_counter_mask;
 	bt = pps->capth->th_offset;
 	bintime_addx(&bt, pps->capth->th_scale * tcount);
-	bintime_add(&bt, &boottimebin);
+	bintime_add(&bt, &pps->capth->th_boottime);
 	bintime2timespec(&bt, &ts);
 
 	/* If the timecounter was wound up underneath us, bail out. */
@@ -1846,7 +1911,7 @@ tc_ticktock(int cnt)
 	if (count < tc_tick)
 		return;
 	count = 0;
-	tc_windup();
+	tc_windup(false, NULL);
 }
 
 static void __inline
@@ -1921,7 +1986,7 @@ inittimecounter(void *dummy)
 	/* warm up new timecounter (again) and get rolling. */
 	(void)timecounter->tc_get_timecount(timecounter);
 	(void)timecounter->tc_get_timecount(timecounter);
-	tc_windup();
+	tc_windup(true, NULL);
 }
 
 SYSINIT(timecounter, SI_SUB_CLOCKS, SI_ORDER_SECOND, inittimecounter, NULL);
@@ -2095,7 +2160,7 @@ tc_fill_vdso_timehands(struct vdso_timehands *vdso_th)
 	vdso_th->th_offset_count = th->th_offset_count;
 	vdso_th->th_counter_mask = th->th_counter->tc_counter_mask;
 	vdso_th->th_offset = th->th_offset;
-	vdso_th->th_boottime = boottimebin;
+	vdso_th->th_boottime = th->th_boottime;
 	enabled = cpu_fill_vdso_timehands(vdso_th, th->th_counter);
 	if (!vdso_th_enable)
 		enabled = 0;
@@ -2116,8 +2181,8 @@ tc_fill_vdso_timehands32(struct vdso_timehands32 *vdso_th32)
 	vdso_th32->th_counter_mask = th->th_counter->tc_counter_mask;
 	vdso_th32->th_offset.sec = th->th_offset.sec;
 	*(uint64_t *)&vdso_th32->th_offset.frac[0] = th->th_offset.frac;
-	vdso_th32->th_boottime.sec = boottimebin.sec;
-	*(uint64_t *)&vdso_th32->th_boottime.frac[0] = boottimebin.frac;
+	vdso_th32->th_boottime.sec = th->th_boottime.sec;
+	*(uint64_t *)&vdso_th32->th_boottime.frac[0] = th->th_boottime.frac;
 	enabled = cpu_fill_vdso_timehands32(vdso_th32, th->th_counter);
 	if (!vdso_th_enable)
 		enabled = 0;
diff --git a/sys/kern/kern_time.c b/sys/kern/kern_time.c
index 148da2b..82710f7 100644
--- a/sys/kern/kern_time.c
+++ b/sys/kern/kern_time.c
@@ -115,9 +115,7 @@ settime(struct thread *td, struct timeval *tv)
 	struct timeval delta, tv1, tv2;
 	static struct timeval maxtime, laststep;
 	struct timespec ts;
-	int s;
 
-	s = splclock();
 	microtime(&tv1);
 	delta = *tv;
 	timevalsub(&delta, &tv1);
@@ -147,10 +145,8 @@ settime(struct thread *td, struct timeval *tv)
 				printf("Time adjustment clamped to -1 second\n");
 			}
 		} else {
-			if (tv1.tv_sec == laststep.tv_sec) {
-				splx(s);
+			if (tv1.tv_sec == laststep.tv_sec)
 				return (EPERM);
-			}
 			if (delta.tv_sec > 1) {
 				tv->tv_sec = tv1.tv_sec + 1;
 				printf("Time adjustment clamped to +1 second\n");
@@ -161,10 +157,8 @@ settime(struct thread *td, struct timeval *tv)
 
 	ts.tv_sec = tv->tv_sec;
 	ts.tv_nsec = tv->tv_usec * 1000;
-	mtx_lock(&Giant);
 	tc_setclock(&ts);
 	resettodr();
-	mtx_unlock(&Giant);
 	return (0);
 }
 
diff --git a/sys/kern/subr_rtc.c b/sys/kern/subr_rtc.c
index dbad36d..4bac324 100644
--- a/sys/kern/subr_rtc.c
+++ b/sys/kern/subr_rtc.c
@@ -172,11 +172,11 @@ resettodr(void)
 	if (disable_rtc_set || clock_dev == NULL)
 		return;
 
-	mtx_lock(&resettodr_lock);
 	getnanotime(&ts);
 	timespecadd(&ts, &clock_adj);
 	ts.tv_sec -= utc_offset();
 	/* XXX: We should really set all registered RTCs */
+	mtx_lock(&resettodr_lock);
 	error = CLOCK_SETTIME(clock_dev, &ts);
 	mtx_unlock(&resettodr_lock);
 	if (error != 0)
diff --git a/sys/kern/sys_procdesc.c b/sys/kern/sys_procdesc.c
index 37139c1..f47ae7c 100644
--- a/sys/kern/sys_procdesc.c
+++ b/sys/kern/sys_procdesc.c
@@ -517,7 +517,7 @@ procdesc_stat(struct file *fp, struct stat *sb, struct ucred *active_cred,
     struct thread *td)
 {
 	struct procdesc *pd;
-	struct timeval pstart;
+	struct timeval pstart, boottime;
 
 	/*
 	 * XXXRW: Perhaps we should cache some more information from the
@@ -532,6 +532,7 @@ procdesc_stat(struct file *fp, struct stat *sb, struct ucred *active_cred,
 
 		/* Set birth and [acm] times to process start time. */
 		pstart = pd->pd_proc->p_stats->p_start;
+		getboottime(&boottime);
 		timevaladd(&pstart, &boottime);
 		TIMEVAL_TO_TIMESPEC(&pstart, &sb->st_birthtim);
 		sb->st_atim = sb->st_birthtim;
diff --git a/sys/net/bpf.c b/sys/net/bpf.c
index 3b12cf4..4251f71 100644
--- a/sys/net/bpf.c
+++ b/sys/net/bpf.c
@@ -2328,12 +2328,13 @@ bpf_hdrlen(struct bpf_d *d)
 static void
 bpf_bintime2ts(struct bintime *bt, struct bpf_ts *ts, int tstype)
 {
-	struct bintime bt2;
+	struct bintime bt2, boottimebin;
 	struct timeval tsm;
 	struct timespec tsn;
 
 	if ((tstype & BPF_T_MONOTONIC) == 0) {
 		bt2 = *bt;
+		getboottimebin(&boottimebin);
 		bintime_add(&bt2, &boottimebin);
 		bt = &bt2;
 	}
diff --git a/sys/netpfil/ipfw/ip_fw_sockopt.c b/sys/netpfil/ipfw/ip_fw_sockopt.c
index d186ba5..7faf99f 100644
--- a/sys/netpfil/ipfw/ip_fw_sockopt.c
+++ b/sys/netpfil/ipfw/ip_fw_sockopt.c
@@ -395,6 +395,7 @@ swap_map(struct ip_fw_chain *chain, struct ip_fw **new_map, int new_len)
 static void
 export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr)
 {
+	struct timeval boottime;
 
 	cntr->size = sizeof(*cntr);
 
@@ -403,21 +404,26 @@ export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr)
 		cntr->bcnt = counter_u64_fetch(krule->cntr + 1);
 		cntr->timestamp = krule->timestamp;
 	}
-	if (cntr->timestamp > 0)
+	if (cntr->timestamp > 0) {
+		getboottime(&boottime);
 		cntr->timestamp += boottime.tv_sec;
+	}
 }
 
 static void
 export_cntr0_base(struct ip_fw *krule, struct ip_fw_bcounter0 *cntr)
 {
+	struct timeval boottime;
 
 	if (krule->cntr != NULL) {
 		cntr->pcnt = counter_u64_fetch(krule->cntr);
 		cntr->bcnt = counter_u64_fetch(krule->cntr + 1);
 		cntr->timestamp = krule->timestamp;
 	}
-	if (cntr->timestamp > 0)
+	if (cntr->timestamp > 0) {
+		getboottime(&boottime);
 		cntr->timestamp += boottime.tv_sec;
+	}
 }
 
 /*
@@ -2048,11 +2054,13 @@ ipfw_getrules(struct ip_fw_chain *chain, void *buf, size_t space)
 	char *ep = bp + space;
 	struct ip_fw *rule;
 	struct ip_fw_rule0 *dst;
+	struct timeval boottime;
 	int error, i, l, warnflag;
 	time_t	boot_seconds;
 
 	warnflag = 0;
 
+	getboottime(&boottime);
         boot_seconds = boottime.tv_sec;
 	for (i = 0; i < chain->n_rules; i++) {
 		rule = chain->map[i];
diff --git a/sys/nfs/nfs_lock.c b/sys/nfs/nfs_lock.c
index 7d11672..c84413e 100644
--- a/sys/nfs/nfs_lock.c
+++ b/sys/nfs/nfs_lock.c
@@ -241,6 +241,7 @@ nfs_dolock(struct vop_advlock_args *ap)
 	struct flock *fl;
 	struct proc *p;
 	struct nfsmount *nmp;
+	struct timeval boottime;
 
 	td = curthread;
 	p = td->td_proc;
@@ -284,6 +285,7 @@ nfs_dolock(struct vop_advlock_args *ap)
 		p->p_nlminfo = malloc(sizeof(struct nlminfo),
 		    M_NLMINFO, M_WAITOK | M_ZERO);
 		p->p_nlminfo->pid_start = p->p_stats->p_start;
+		getboottime(&boottime);
 		timevaladd(&p->p_nlminfo->pid_start, &boottime);
 	}
 	msg.lm_msg_ident.pid_start = p->p_nlminfo->pid_start;
diff --git a/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c b/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c
index 1d07943..0879299 100644
--- a/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c
+++ b/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c
@@ -504,11 +504,13 @@ svc_rpc_gss_find_client(struct svc_rpc_gss_clientid *id)
 {
 	struct svc_rpc_gss_client *client;
 	struct svc_rpc_gss_client_list *list;
+	struct timeval boottime;
 	unsigned long hostid;
 
 	rpc_gss_log_debug("in svc_rpc_gss_find_client(%d)", id->ci_id);
 
 	getcredhostid(curthread->td_ucred, &hostid);
+	getboottime(&boottime);
 	if (id->ci_hostid != hostid || id->ci_boottime != boottime.tv_sec)
 		return (NULL);
 
@@ -537,6 +539,7 @@ svc_rpc_gss_create_client(void)
 {
 	struct svc_rpc_gss_client *client;
 	struct svc_rpc_gss_client_list *list;
+	struct timeval boottime;
 	unsigned long hostid;
 
 	rpc_gss_log_debug("in svc_rpc_gss_create_client()");
@@ -547,6 +550,7 @@ svc_rpc_gss_create_client(void)
 	sx_init(&client->cl_lock, "GSS-client");
 	getcredhostid(curthread->td_ucred, &hostid);
 	client->cl_id.ci_hostid = hostid;
+	getboottime(&boottime);
 	client->cl_id.ci_boottime = boottime.tv_sec;
 	client->cl_id.ci_id = svc_rpc_gss_next_clientid++;
 	list = &svc_rpc_gss_client_hash[client->cl_id.ci_id % CLIENT_HASH_SIZE];
diff --git a/sys/sys/time.h b/sys/sys/time.h
index 395e888..659f8e0 100644
--- a/sys/sys/time.h
+++ b/sys/sys/time.h
@@ -372,8 +372,6 @@ void	resettodr(void);
 
 extern volatile time_t	time_second;
 extern volatile time_t	time_uptime;
-extern struct bintime boottimebin;
-extern struct timeval boottime;
 extern struct bintime tc_tick_bt;
 extern sbintime_t tc_tick_sbt;
 extern struct bintime tick_bt;
@@ -440,6 +438,9 @@ void	getbintime(struct bintime *bt);
 void	getnanotime(struct timespec *tsp);
 void	getmicrotime(struct timeval *tvp);
 
+void	getboottime(struct timeval *boottime);
+void	getboottimebin(struct bintime *boottimebin);
+
 /* Other functions */
 int	itimerdecr(struct itimerval *itp, int usec);
 int	itimerfix(struct timeval *tv);


More information about the svn-src-head mailing list