svn commit: r302252 - head/sys/kern
Konstantin Belousov
kostikbel at gmail.com
Thu Jun 30 18:01:16 UTC 2016
On Fri, Jul 01, 2016 at 03:30:41AM +1000, Bruce Evans wrote:
> On Thu, 30 Jun 2016, Konstantin Belousov wrote:
>
> > On Thu, Jun 30, 2016 at 05:47:39AM +1000, Bruce Evans wrote:
> >> On Wed, 29 Jun 2016, Konstantin Belousov wrote:
> >>
> >>> On Wed, Jun 29, 2016 at 05:54:43PM +0300, Konstantin Belousov wrote:
> >>>> As an additional, but probably excessive measure, invocation of
> >>>> tc_windup() from hardclock could ask the setclock() to re-execute
> >>>> tc_windup() after its call is done.
> >>
> >> I don't quite understand this. hardclock -> tc_windup() normally doesn't
> >> have interference from tc_setclock().
> > In what sense it does not have interference ? tc_windup() may be executed
> > from hardclock/hardclock_cnt (low half) and concurrently from setclock()
> > (top half). And this is exactly what we want to avoid, isn't it ?
>
> Ues, but you seemed to be saying that hardclock should schedule or call
> tc_setclock(). That makes no sense. So I thought that you meant
> tc_setclock() calling or scheduling tc_windup() in a different way than
> it does now (it just calls it now).
More precisely, I mean that hardclock, when not able to execute tc_windup()
due to the top-half currently executing tc_windup(), may ask top-half to
re-execute tc_windup() (and not tc_setclock()) once the current invocation
finishes.
>
> >>> ...
> >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> >>> index 0f015b3..a5776d2 100644
> >>> --- a/sys/kern/kern_tc.c
> >>> +++ b/sys/kern/kern_tc.c
> >>> ...
> >>> @@ -1237,6 +1238,7 @@ tc_setclock(struct timespec *ts)
> >>> struct timespec tbef, taft;
> >>> struct bintime bt, bt2;
> >>>
> >>> + critical_enter();
> >>> cpu_tick_calibrate(1);
> >>> nanotime(&tbef);
> >>> timespec2bintime(ts, &bt);
> >>> @@ -1247,8 +1249,10 @@ tc_setclock(struct timespec *ts)
> >>> bintime2timeval(&bt, &boottime);
> >>
> >> boottime is one of the critical variables. Before this, there are 2
> >> accesses to boottimebin. The Giant locking that you just removed gave
> >> only some protection here. It prevented concurrent acccess by
> >> malicious/stress-testing code doing settime(). It didn't prevent
> >> nanotime() and friends seeing boottime in an in-between state. Oops,
> >> mutex locking won't fix that either. I think the update of these
> >> variables needs to be moved into tc_windup() where it will hopefully
> >> automatically be protected by the generation count, like the timehands
> >> offsets already are.
> > I do not understand how that would help. Do you mean to move the vars
> > into current timehands ?
>
> I hoped that it would just work without putting it in timehands, but
> now I don't see a better way that putting it in timehands. The space
> wastage is not too bad, and we might need it in timehands for other
> reasons. One is that the actual boot time is invariant, but FreeBSD
> varies it; a global is needed for the actual boot time and then it is
> natural to put the fudged boot time in timehands (it is not a boot
> time, but an offset which is sometimes close to the boot time).
>
> Note that boottimebin is also updated in tc_windup() (for leap seconds).
> The new mutex doesn't protect it there. This is the next of other
> reasons to put it in timehands.
Yes.
>
> >*...
> >>> + for (;;) {
> >>> + if (atomic_cmpset_int(&tc_windup_lock, 0, 1)) {
> >>> + atomic_thread_fence_acq();
> >>> + tc_windup_locked();
> >>> + atomic_store_rel_int(&tc_windup_lock, 0);
> >>> + break;
> >>> + } else if (!top_call) {
> >>> + break;
> >>> + }
> >>> + }
> >>> }
> >>
> >> I like to write optimized locking like that, but don't see why you
> >> want it here.
> > Because I do not want the fast clock interrupt handler to wait on lock,
> > which neccessary becomes spinlock. The ntp_lock is bad enough already.
> > What I coded above is not real lock. In contention case it bails out,
> > so there is no blocking. In particular, witness is not relevant to it.
>
> I think this is too complicated. Spinlocks aren't very expensive, and
> hardclock() already has some. I think it has several. It has at least
> one silly one -- in hardclock_cpu(), a thread locking to lock a single
> flags update. hardclock() and tc_windup() just aren't called enough
> for the locking overhead or contention to matter, even at too-large hz.
Spinlocks are quite expensive. They delay interrupt delivery.
I do not want the fast interrupt handler to be delayed due to the top-half
of the kernel executing settimeofday(2) in loop.
>
> > ...
> > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> > index 0f015b3..7bb767c 100644
> > --- a/sys/kern/kern_tc.c
> > +++ b/sys/kern/kern_tc.c
> >...
> > +static struct mtx tc_setclock_mtx;
> > +MTX_SYSINIT(tc_setclock_init, &tc_setclock_mtx, "tcsetc", MTX_DEF);
>
> I thought it would be a spinlock. The sleep lock is better if you can
> make ot work, but this is complicated.
For mutual exclusion of invocation of tc_setclock(), I do not see why
sleepable lock would not suffice.
>
> > +
> > /*
> > * Step our concept of UTC. This is done by modifying our estimate of
> > * when we booted.
> > - * XXX: not locked.
> > */
>
> Now "not properly locked" -).
>
> > void
> > tc_setclock(struct timespec *ts)
> > @@ -1237,6 +1240,8 @@ tc_setclock(struct timespec *ts)
> > struct timespec tbef, taft;
> > struct bintime bt, bt2;
> >
> > + mtx_lock(&tc_setclock_mtx);
> > + critical_enter();
>
> I thought at first that this critical section was redundant, since a
> spinlock would already have it.
It is not spinlock.
>
> > cpu_tick_calibrate(1);
> > nanotime(&tbef);
> > timespec2bintime(ts, &bt);
> > @@ -1247,8 +1252,10 @@ tc_setclock(struct timespec *ts)
> > bintime2timeval(&bt, &boottime);
>
> The leap second update to boottimebin in tc_windup() races with the
> update to boottimebin here. This can be fixed partially by using the
> same locking there. Then the locking would have to be a spin mutex
> (or special). This still doesn't fix non-atomic accesses to boottimebin
> in nanotime() and friends. Putting it in timehands seems to be best for
> fixing that.
Yes, timehands for bootimebin should be the solution, but
not in the scope of this patch. I will work on this right after the
current changeset lands in svn.
> boottime is easier to fix (and doesn't need to be in timehands). It
> is only used to copy it out in a historical sysctl. The locking for
> that is buggy. We convert boottimebin to boottime here (now with locking)
> but in the sysctl we copy out boottime non-atomically.
Do we need boottime at all ? Can't it be calculated from boottimebin
on the sysctl invocation ?
More information about the svn-src-all
mailing list