the PS/2 mouse problem

Sat Nov 8 05:13:58 PST 2003

Scott Long wrote:
> Bruce Evans wrote:
> 
>> On Fri, 7 Nov 2003, Morten Johansen wrote:
>>
>>
>>> Morten Johansen wrote:
>>>
>>>> Scott Long wrote:
>>>>
>>>>
>>>>> One thought that I had was to make psmintr() be INTR_FAST.  I need to
>>>>> stare at the code some more to fully understand it, but it looks 
>>>>> like it
>>>>> wouldn't be all that hard to do.  Basically just use the interrupt
>>>>> handler
>>>>> to pull all of the data out of the hardware and into a ring buffer in
>>>>> memory, and then a fast taskqueue to process that ring buffer.  It 
>>>>> would
>>>>> at least answer the question of whether the observed problems are 
>>>>> due to
>>>>> ithread latency.  And if done right, no locks would be needed in
>>>>> psmintr().
>>
>>
>>
>> However, it is usually easier to use a lock even if not strictly 
>> necessary.
>> psm as currently structured uses the technique of calling psmintr() from
>> the timeout handler.  This requires a lock.  If this were not done, then
>> the timeout routine would probably need to access hardware using 
>> scattered
>> i/o instructions, and these would need locks (to prevent them competing
>> with i/o instructions in psmintr()).  Putting all the hardware accesses
>> in the fast interrupt handler is simpler.  The sio driver uses this 
>> technique
>> but doesn't manage to put _all_ the i/o's in the interrupt handler, so it
>> ends up having to lock out the interrupt handler all over the place.
>> Ring buffers can be self-locking using delicate atomic instructions, but
>> they are easier to implement using locks.
>>
>>
>>>> I can reproduce the problem consistently on my machine, by moving the
>>>> mouse around, while executing e.g this command in a xterm:
>>>>
>>>> dd if=/dev/zero of=test bs=32768 count=4000; sync; sync; sync
>>>>
>>>> when the sync'ing sets in the mouse attacks.
>>>> It is very likely due to interrupt latency.
>>>>
>>>> I'd be happy to test any clever patches.
>>>
>>>
>>> Wow. You are completly right!
>>> By using a MTX_SPIN mutex instead, and marking the interrupt handler
>>> INTR_MPSAFE | INTR_FAST, my problem goes away.
>>> I am no longer able to reproduce the mouse attack.
>>> I have not noticed any side-effects of this. Could there be any?
>>> I will file a PR with an updated patch, unless you think it's a better
>>> idea to rearrange the driver.
>>> Probably the locking could be done better anyway.
>>
>>
>>
>> Er, psmintr() needs large changes to become a fast interrupt handler.  it
>> does many things that may not be done by a fast interrupt handler, 
>> starting
>> with the first statement in it:
>>
>>     /* read until there is nothing to read */
>>     while((c = read_aux_data_no_wait(sc->kbdc)) != -1) {
>>
>> This calls into the keyboard driver, which is not written to support any
>> fast interrupt handlers.
> 
> 
> Actually, it calls the keyboard controller driver, not the keyboard
> driver.
> 
>> In general, fast interrupt handlers may not call
>> any functions, since the "any" function doesn't know that it is called in
>> fast interrupt handler context and may do things that may not be done in
>> fast interrupt handler context.  As it happens, read_aux_data_no_wait()
>> does the following bad things:
>> - it accesses private keyboard data.  All data that is accessed by a fast
>>   interrupt handler must be locked by a common lock or use self-locking
>>   accesses.  Data in another subsystem can't reasonably be locked by this
>>   (although the keyboard subsystem is close to psm, you don't want to
>>   export the complexities of psmintr()'s locking to the keyboard 
>> subsystem).
>> - it calls other functions.  The closure of all these calls must be 
>> examined
>>   and made fast-interrupt-handler safe before this is safe.  The 
>> lowest level
>>   will resolve to something like inb(PSMPORT) and this alone is obviously
>>   safe provided PSMPORT is only accessed in the interrupt handler or is
>>   otherwise locked.  (Perhaps the private keyboard data is actually 
>> private
>>   psm data that mainly points to PSMPORT.  Then there is no problem 
>> with the
>>   data accesses.  But the function calls make it unclear who owns the 
>> data.)
> 
> 
> The problem here is that the keyboard controller driver tries to be too
> smart. If it detects that the hardware FIFO is full, it'll drain it into
> a per-softc, per-function ring buffer.  So having psm(4) just directly
> read the hardware is insufficient in this scheme.
> 
>> - it sometimes calls the DELAY() function, which is not permitted in fast
>>   interrupt handlers since apart from locking issues, fast interrupt 
>> handlers
>>   are not permitted to busy-wait.
> 
> 
> Again, the keyboard controller driver is too smart for its own good.  To
> summarize:
> 
> read_aux_data_no_wait()
> {
>     Does softc->aux ring buffer contain data?
>         return ring buffer data
> 
>     Check the status register
>     Is the keyboard fifo full?
>         DELAY(7us)
>         read keyboard fifo into softc->kbd ring buffer
>         Check the status register
> 
>     Is the aux fifo full?
>         DELAY(7us)
>         return aux fifo data
> }
> 
> So you can wind up stalling for 14us in there, presumably because you
> cannot read the status and data registers back-to-back without a delay.
> I don't have the atkbd spec handy so I'm not sure how to optimize this.
> Do you really need to check the status register before reading the data
> register?
> 
>>
>> Many of the complications for fast interrupt handlers shouldn't be needed
>> in psm.  Just make psmintr() INTR_MPSAFE.
> 
> 
> I believe that the previous poster actually tried making it INTR_MPSAFE, 
> but didn't see a measurable benefit because the latency of scheduling
> the ithread is still unacceptable.

That is 100% correct.
In the meantime I have taken your's and Bruce's advice and rearranged 
the interrupt handler to look like this:

mtx_lock(&sc->input_mtx);
while((c = read_aux_data_no_wait(sc->kbdc)) != -1) {
     sc->input_queue.buf[sc->input_queue.tail] = c;
     if ((++ sc->input_queue.tail) >= PSM_BUFSIZE)
         sc->input_queue.tail = 0;
     count = (++ sc->input_queue.count);
}
mtx_unlock(&sc->input_mtx);
if (count >= sc->mode.packetsize)
     taskqueue_enqueue(taskqueue_swi_giant, &sc->psm_task);

And it works, but having it INTR_MPSAFE still does NOT help my problem.
It looks to me like data is getting lost because the interrupt handler
is unable to read it before it's gone, and the driver gets out of sync,
and has to reset itself.
However it now takes a few more tries to provoke the problem, so 
something seems to have improved somewhere.

   Morten

> 
>> This is nontrival, however.
>> Fine grained locking gaves many of the complications that were only
>> in fast interrupt handlers in RELENG_4.  E.g., for psmintr() to be 
>> MPSAFE,
>> all of its calls into the keyboard subsystem need to be MPSAFE, and they
>> are unlikely to be so unless the keyboard subsystem is made MPSAFE.
>>
>> The following method can be used to avoid some of the complications:
>> make the interrupt handler not touch much data, so that it can be
>> locked easily.  The data should be little more than a ring buffer.
>> Make the handler either INTR_MPSAFE or INTR_FAST (it doesn't matter
>> for slow devices like psm).  Put all the rest of what was in the
>> interrupt handler in non-MPSAFE code (except where it accesses data
>> shared with the interrupt handler) so that all of this code and its
>> closure doesn't need to be made MPSAFE.  This method is what the sio
>> driver uses in -current, sort of accidentally.  sio's SWI handler and
>> all of the tty subsystem are not MPSAFE, but this has almost no visible
>> effect because very low latency is only needed at the lowest level.
> 
> 
> This strategy is exactly what I had in mind, but getting the data out
> of the hardware is the tricky part right now.  Maybe if we make the
> keyboard and psm drivers be INTR_FAST, there will be no need for the
> keyboard controller to manage local ring buffers.  If so, it _vastly_
> simplifies the locking in there.
> 
>>
>> BTW, the flags combination (INTR_MPSAFE | INTR_FAST) makes no sense.
>> INTR_MPSAFE only applies to non-INTR_FAST handlers and INTR_FAST is
>> currently non-advisory.
>>
>> Bruce
>>
> 
> 
> Scott
>