Interesting anomoly with a 2940UW

Tue Sep 9 15:43:12 PDT 1997

>Wouldn't matter.  If we did pause things here, then when we unpaused them, 
>the QOUTCNT register would get incremented as we are writing CLRCMDINT to 
>CLRINT, then we would check QOUTCNT again, it would be non-zero, so we would 
>re-run the loop, and we would re-write the CMDOUTCNT variable again.

Sure, but what causes high interrupt latency, Doug?  Other interrupt
handlers running, or your interrupt handler taking a long time, is what
causes it.  In FreeBSD, your interrupt handler can be interrupted by any
non masked interrupts which means, during that window, you could easily be
diverted from running your loop perhaps long enough for multiple commands
to pile up which might just be long enough for you to overflow the qoutfifo
before your interrupt handler is resumed and you can complete your work.
And, since the sequencer might have stuffed multiple commands into the
QOUTFIFO since you last read it, the variance in what you write to
CMDOUTCNT and how full the fifo is could be quite large.  For example:

qoutcnt <- QOUTCNT == 5
Process 5 commands in interrupt handler    10 Commands complete in sequencer
Set CMDOUTCNT to 0 should be 10

qoutcnt <- QOUTCNT == 10
Process 10 commands			    8 Commands complete in sequencer
	OVERFLOW!!!!

>> Above and beyond this, the code you wrote is inefficient.  If you have
>> good interrupt latency, you will pause the sequencer on every command
>> completion.  If you use the algorithm I mentioned initially, pause and
>> clear the CMDOUTCNT value every fifodepth completions, you remove this
>> race and also pause the sequencer as little as possible.
>
>OK...look at this:
>
>news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 

repeats 56 times...

>Now, tell me that we don't have high interrupt latency and that the 
>efficiency of that code is as bad as one might think.

Okay.  I'll tell you again.  It's inefficient code.  In the example
you site, you're only able to fill the QOUTFIFO 56 times after performing
how many transactions???  Probably a few hundred thousand on a busy news
server if not more.

I never said that you don't have high interrupt latency.  What I said was
that I don't have high interrupt latency, but of course, I don't run
Linux.  In my system, the hardware interrupt handler for the aic7xxx card
simply removes the entry from the QOUTFIFO, sets a few status bits in
the generic SCSI structure associated with this transaction and queues it
to a software interrupt handler.

>As I explained to Dan a few days ago in a private email, when I was messing 
>around with using a bottom half completion routine, I ran into two problems. 
> All of the bottom half and task queues are either run based upon the 
>scheduler, which we *can't* base our completion upon or we risk a deadlock 
>when the scheduler is blocked for a swap operation, or they are based on the 
>timer interrupt.  The timer interrupt based completion routine had horrible 
>performance for char reads, namely because each and every read is small and 
>done sequentially, so the added overhead of waiting for a timer interrupt to 
>do completion processing was a killer.  Now, in the standard isr routine, we 
>leave interrupts disabled the whole time, including during our completion 
>processing.

Its a shame that Linux doesn't offer a decent software interrupt strategy,
but that's not my problem.  You should still be able to get decent latency
for setting the CMDOUTCNT back to 0 if you clear the QOUTFIFO first,
putting entries into a list, setting CMDOUTCNT to zero, then processing
the entries on the list.  You are probably getting into your interrupt
handler plenty fast, but getting crushed by the overhead of generic SCSI
processing at interrupt time.

>The interrupt routine that produced the messages above was modified, it 
>enables interrupts during the completion processing.  Our isr won't get 
>called re-entrantly due to the kernel irq mechanism, but it does allow other 
>interrupts to run during completion processing (so things like mouse 
>movement in X won't be so jerky during heavy load).  The result of that, is 
>that our interrupt latency can actually get worse as our completion 
>processing may suffer intermittent interrupts, but we are generally speaking 
>being friendlier to the system.  It's a tradeoff, we give ourselves, with 
>the spin lock in place, a little more latency since we already happen to 
>have a lot, in exchange, we reduce the amount of time we run with interrupts 
>off.

Wow.  I never knew that you used to run your interrupt handler with all 
other interrupts disabled.  Don't your network servers drop packets like
crazy when you do this?

>The second reason I wrote it that way is because of this.  Let's say your 
>code answers an interrupt with two commands on the QOUTFIFO, and p->
>cmdoutcnt == 12, then cmdoutcnt will get incremented to 14 while the 
>QOUTFIFO goes to zero.  Now, if the next interrupt has a high latency, then 
>you may end up using that spin lock far before you ever reach the QOUTFIFO 
>depth since you didn't update the CMDOUTCNT variable during the last isr.  
>So, which is more inneficient, allowing a high latency interrupt to block 
>with only a command or two complete, or writing out the actual CMDOUTCNT on 
>each interrupt routine when we are already writing to the card?  Keep in 
>mind the interrupt latency that we see sometimes.

I'm fully aware that CMDOUTCNT does not directly track the current state
of the FIFO.  I wanted a lazy update as it means I only have to do a single
write which can be done with AAP.  In order for your algorithm to work, you
have to perform a read and a write with the sequencer paused and having 
looked at what this does with a PCI bus analyzer, it's simply not worth
it.

>Also, who's to say the 
>reason you don't see messages about the QOUTCNT isn't due to this very 
>condition instead of interrupt latency?  A better test to see if this 
>algorithm does what you want would be not to check and print a message about 
>the QOUTFIFO depth, but check to see if your sequencer is spin locking on 
>CMDOUTCNT and holding up the bus.

Actually, I incremented a count in sequencer scratch ram for every time I
hit the lock.  Either every time I went to look it had wrapped to 0 or my
lock was never hit.  As I said before, you are probably getting into
your interrupt handler plenty fast, it's just that your interrupt handler
runs for a long time before you go back and clean out the queue.

>*****************************************************************************
>* Doug Ledford                      *   Unix, Novell, Dos, Windows 3.x,     *
>* dledford at dialnet.net    873-DIAL  *     WfW, Windows 95 & NT Technician   *
>*   PPP access $14.95/month         *****************************************
>*   Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
>*   communities.  Sign-up online at * Web page creation and hosting, other  *
>*   873-9000 V.34                   * services available, call for info.    *
>*****************************************************************************

--
Justin T. Gibbs
===========================================
  FreeBSD: Turning PCs into workstations
===========================================