svn commit: r320901 - in head/sys: amd64/amd64 isa x86/isa

Mon Jul 17 16:30:50 UTC 2017

On Thu, 2017-07-13 at 01:42 +1000, Bruce Evans wrote:
> On Wed, 12 Jul 2017, Ian Lepore wrote:
> 
> > Log:
> >  Protect access to the AT realtime clock with its own mutex.
> >
> >  The mutex protecting access to the registered realtime clock should not be
> >  overloaded to protect access to the atrtc hardware, which might not even be
> >  the registered rtc. More importantly, the resettodr mutex needs to be
> >  eliminated to remove locking/sleeping restrictions on clock drivers, and
> >  that can't happen if MD code for amd64 depends on it. This change moves the
> >  protection into what's really being protected: access to the atrtc date and
> >  time registers.
> 
> The spin mutex provided some protection against timing bugs caused by
> preemption, but it is impossible to hold the mutex around the user code
> that is setting the time, so both the kernel and user code should check
> if the operation took too long and fail/retry if it did.
> 

I can't figure out what you're talking about here.  There is no
userland code involved in any of this stuff.

> With correct code like that, spin mutexes are still good for limiting
> retries.  I think it is best to try to hold one around the entire kernel
> part of the operation, and release it when sleeping at lower levels.
> 

I'm not sure what this is about either.  The current code for getting
the atrtc has no retries.

> > Modified: head/sys/x86/isa/atrtc.c
> > ==============================================================================
> > --- head/sys/x86/isa/atrtc.c  Tue Jul 11 21:55:20 2017        (r320900)
> > +++ head/sys/x86/isa/atrtc.c  Wed Jul 12 02:42:57 2017        (r320901)
> > @@ -53,9 +53,17 @@ __FBSDID("$FreeBSD$");
> > #include 
> > #include "clock_if.h"
> >
> > +/*
> > + * clock_lock protects low-level access to individual hardware registers.
> > + * atrtc_time_lock protects the entire sequence of accessing multiple registers
> > + * to read or write the date and time.
> > + */
> > #define       RTC_LOCK        do { if (!kdb_active) mtx_lock_spin(&clock_lock); } while (0)
> > #define       RTC_UNLOCK      do { if (!kdb_active) mtx_unlock_spin(&clock_lock); } while (0)
> 
> This has a lot of old design and implementation bugs:
> - it abuses the i8254 mutex for the rtc
> - it obfuscates this using macros
> - locking individual register accesses is buggy.  Register accesses need to
>    be locked in groups like your new code does
> - the kdb_active hack is buggy (it can't use mutexes and is unnecessarily
>    non-reentrant)
> The locks should be separate and statically allocated.
> 

I've always assumed the locking of individual register access was
because rtcin() and writertc() are global functions used by other parts
of the kernel to access individual bytes of cmos nvram, and that
locking the whole device for the duration of the larger operations
involved might cause problems. 

For example, the nvram driver does byte at a time accesses in a loop
that includes uimove() calls, so holding a lock on all atrtc hardware
access for the duration of that isn't a good option.

It does seem like the atrtc code should have its own register-access
mutex rather than using clock_lock.  As you mentioned, the fact that
it's borrowing a mutex is pretty well obscured; I hadn't noticed it.

> >
> > +struct mtx atrtc_time_lock;
> > +MTX_SYSINIT(atrtc_lock_init, &atrtc_time_lock, "atrtc", MTX_DEF);
> > +
> 
> I think the old mutex should be used here, after fixing it.  Locking
> individual register access is a special case of locking groups of
> register acceses.  It must be a spin mutex for low-level use, and that
> is good for high-level use too, to complete the high-level operations
> as soon as possible.
> 

I'm not sure what you're referring to when you say "the old mutex".

> You used this mutex for efirt.c, but I think that is another error like
> sharing the i8254 mutex (or the resettodr mutex).   Different clock
> drivers should just use different mutexes.  Perhaps some drivers actually
> need sleep mutexes.
> 

I assumed the reason the efirt code was using the same mutex that
protected access to the at rtc was because under the hood, the efi rt
clock is really the same hardware too, accessed via bios code.

> > ...
> > @@ -352,6 +364,7 @@ atrtc_gettime(device_t dev, struct timespec *ts)
> >        * to make sure that no more than 240us pass after we start reading,
> >        * and try again if so.
> >        */
> > +     mtx_lock(&atrtc_time_lock);
> >       while (rtcin(RTC_STATUSA) & RTCSA_TUP)
> >               continue;
> >       critical_enter();
> 
> Note the comment about the 240us window.  On i386, this was broken by
> SMPng, and almost fixed by starting the locking outside of the loop
> (a spin mutex would fix it).
> 
> On i386, before SMPng, there was an splhigh() starting before the loop.
> rtcin() also used splhigh().  This was only shortly before SMPng --
> FreeBSD-4 had both splhigh()'s, but FreeBSD-3 had neither.
> 
> With no locking, an interrupt after the read of the status register can
> occur, and interrupt handling can easily take longer than the 240us window
> even if it doesn't cause preemption.
> 
> critical_enter() in the above gives poor man's locking which blocks
> preemption but not-so-fast interrupt handlers.
> 
> The race is not very important since this code only runs at boot time.
> 

And at resume from sleep/lowpower modes, I think.  This is not
something I can test easily, but I see calls to inittodr() from apm and
pmtimer code.  Even so, I think it qualifies as called very
infrequently.

That's why I think the right way to fix the coherent reading problem is
not trying harder to stay within a 244us window, but rather to stop
even trying to.  Instead, look at that flag only to prevent starting
the operation while an update is in progress, and use the more usual
"loop until you read the same values twice" logic to get a coherent
read.  That's especially true considering that taking longer than 244us
to do the reads is only a problem once per second; during the other
999756us you'll still get coherent values.

-- Ian

> The old splhigh() locking and a [spin] mutex around the loop lock out
> interrupts and/or preemption for too long (0-1 seconds), but again this
> is not very important since this code only runs at boot time.
> 
> This is fixed in my version, but only for spl locking.  The lock is
> acquired before each test in the loop and dropped after each failing
> test in the loop.  Then operation is completed as far as necessary
> while the lock is still held.
> 
> > @@ -369,6 +382,7 @@ atrtc_gettime(device_t dev, struct timespec *ts)
> >       ct.year += (ct.year < 80 ? 2000 : 1900);
> > #endif
> >       critical_exit();
> > +     mtx_unlock(&atrtc_time_lock);
> >       /* Set dow = -1 because some clocks don't set it correctly. */
> >       ct.dow = -1;
> >       return (clock_ct_to_ts(&ct, ts));
> 
> The new mtx_lock() should be placed next to the old critical_enter() too.
> Put these before or after the loop depending on whether races or high
> latency are preferred.
> 
> Bruce
>