svn commit: r247068 - head/sys/x86/isa

Fri Feb 22 06:26:18 UTC 2013

On Thu, 21 Feb 2013, Warner Losh wrote:

> On Feb 21, 2013, at 5:31 AM, Bruce Evans wrote:
>
>> On Thu, 21 Feb 2013, Warner Losh wrote:
>>
>>> Log:
>>> Fix broken usage of splhigh() by removing it.
>>
>> This is more broken than before.  The splhigh() served to indicate
>> missing locking.
>
> Depends on what you mean by more :)  It is less depessimized after the nopification of splhigh().

The null spl didn't cost anything.

>>> Modified: head/sys/x86/isa/atrtc.c
>>> ==============================================================================
>>> --- head/sys/x86/isa/atrtc.c	Thu Feb 21 00:36:12 2013	(r247067)
>>> +++ head/sys/x86/isa/atrtc.c	Thu Feb 21 00:40:08 2013	(r247068)
>>> @@ -328,7 +328,6 @@ static int
>>> atrtc_gettime(device_t dev, struct timespec *ts)
>>> {
>>> 	struct clocktime ct;
>>> -	int s;
>>>
>>> 	/* Look if we have a RTC present and the time is valid */
>>> 	if (!(rtcin(RTC_STATUSD) & RTCSD_PWR)) {
>>> @@ -338,11 +337,8 @@ atrtc_gettime(device_t dev, struct times
>>>
>>> 	/* wait for time update to complete */
>>> 	/* If RTCSA_TUP is zero, we have at least 244us before next update */
>>
>> As the comment says, this is time-critical code.  It needs to do something
>> to prevent it being preempted for more than 244 usec
>
> That's a long time...

No, preemptions can easily be for as long as the quantum (default 100
milliseconds).

>
>>> -	s = splhigh();
>>
>> It used to do something...
>>
>>> -	while (rtcin(RTC_STATUSA) & RTCSA_TUP) {
>>> -		splx(s);
>>> -		s = splhigh();
>>> -	}
>>> +	while (rtcin(RTC_STATUSA) & RTCSA_TUP)
>>> +		continue;
>>
>> You should probably have changed this to a critical section like you did
>> in ppc.  Disabling hardware interrupts would be even better.
>
> I'll replace with a critical section.
>
>> There is a problem with the "show rtc" command in ddb.  It was born
>> broken (racy), and the races were turned into deadlocks by adding
>> locking in rtcin().  So if you trace through this code, then
>> "show rtc" while holding the lock in rtcin() will deadlock.  It is a
>> bug for ddb to call any code that might try to acquire a mutex, but
>> "show rtc" always calls rtcin() and rtcin() always tries to aquire a
>> mutex.  Similar deadlocks on the i8254 lock in DELAY() are worked
>> around by not trying to acquire the lock in kdb mode.
>
> kbd_active is what I need to check, right? I'll fix that while here.

It is logically wrong, since only ddb has a command for accessing the
RTC.

>>> 	ct.nsec = 0;
>>> 	ct.sec = readrtc(RTC_SEC);
>>> 	ct.min = readrtc(RTC_MIN);
>>>
>>
>> There are 8 or 9 readrtc()'s altogether.  These must be atomic, and all
>> within the 244 usec limit.  There is considerable danger of exceeding the
>> limit without even being preempted.  Each readrtc() does 1 or 4 isa bus
>> accesses.  I've seen a single isa bus access taking 139 usec (delayed by
>> PCI DMA).
>>
>> Another way of doing this without any locking against preemption or
>> timing magic is to read the time before and after.  If it took more than
>> 244 usec from before seeing RTCSA_TUP deasserted to after the last
>> readrtc(), then retry.
>
> By computing a time difference, or by checking RTCSA_TUP?

The former.  Needs a working timecounter or just some other timer with
enough precision.  RTCSA_TUP is volatile sand only gives the state at
the time you read it, but you need to know if it became set while you
were reading the registers.

>> My version still depends on splhigh() working, but it does more
>> sophisticated waiting that gives a full second to read the registers
>> without getting more than a stale time.  The above reads an RTC value
>> that may be stale by almost 1 second.  My version busy-waits until
>> just after after the counter rolls over.  This is only suitable for
>> boot=time initialization.  Debugging printfs show that this never got
>> preempted over the last few years -- the whole operation always
>> completed within about 40 usec of seeing the rollover.
>
> So this isn't a problem on fast machines?

Fast machines shrink the race windows.

I forgot that inittodr() is almost a non-problem for another reason:
it is, or should, only be called at boot time and resume times.  At
these times, user threads shouldn't be running so we don't have to
worry about preemption for a full quantum.

resettodr() is more interesting.  Privileged applications can try
racing and/or mistiming it it using settime(2) in loops from multiple
threads.

Locking in the kernel settime() is quite broken too.  It was broken even
in FreeBSD-4 where the splclock() part of it worked, since it was:

@	s = splclock();
@	...
@	set_timecounter(&ts);		/* OK so far. */
@	(void) splsoftclock();		/* Wrong (1). */
@	lease_updatetime(delta.tv_sec);
@	splx(s);			/* Wronger (2). */
@	resettodr();
@	return (0);

(1) We want to continue with lower priority for lease_updatetime(), but
     we haven't done resettodr() yet.  OTOH, resettodr() might take too
     long to be done all at splclock().
(2) We even completely lower the priority before resettodr().

In -current, this has rotted to:

@	s = splclock();			/* Wrong (3). */
@	...
@	mtx_lock(&Giant);		/* Bogus (4). */
@	tc_setclock(&ts);
@	// lease_updatetime() was removed, including its splsoftclock()
@	// and also the splx(s) which doesn't belong to it.  So now there
@	// is an splclock() with no corresponding splx().  Compilers
@	// should complain that s is unused.
@	resettodr();
@	mtx_unlock(&Giant);
@	return (0);

After splclock() became null, anti-premption became broken.  This isn't
even Giant-locked.  It accesses some globals without locking:
- boottime, via microtime().  Locking for boottime is quite broken.
   Here we can cause boottime to be volatile by racing with another
   thread doing the same micro-adjustments as us.  It is a bug for
   boottime to be volatile at all.  It shouldn't be changed by
   micro-adjustments.
- securelevel, via securelevel_gt().
Giant locking wouldn't help here.

(4) Giant locking is fairly bogus for tc_setclock(), but better than
     nothing.  It prevents contention from here.  Similarly for
     resettodr().  tc_setclock() is called from elsewhere mainly
     from inittodr().  I think it has no Giant locking then.

>> Linux does even more sophisticated waiting for writing the registers,
>> so as not to be off by another second in the write operation.
>
> Yea, that would be nice, I may have to look into that, once I'm done fetching these stones.

The above resettodr() call is the main thing that writes the registers.
I think settime() should block for a second or 2 as necessary to sync
with the hardware.  It must release all locks while waiting for the
hardware.  Then when it writes, it updates the hardware with the
current time, which is normally a fractional second after settime()
starts.  The splx(s) before the resettodr() was actually correct for
this.  Now resettodr() can be Giant locked, but if its callers acquire
Giant as above, then it must release Giant while sleeping.

Bruce