Interesting anomoly with a 2940UW

Tue Sep 9 15:03:58 PDT 1997

--------
> >Actually, the way I have the locking code in the kernel, there is a small 
> >chance of an overflow, although I haven't worked out exactly yet how it 
> >would occur.  Instead of specifically setting the CMDOUTCNT to 0, I do the 
> >following:
> >
> >while ( qoutcnt > 0 )
> >{
> >  for(i=0; i<qoutcnt; i++)
> >  {
> >    do our stuff here
> >  }
> >  if (p->flags & PAGE_ENABLED)
> >  {
> >    pause_sequencer(p);  // In case we ever have paging on 2{7,8}4x class 
> >cards
> >    outb(qoutcnt - i, p->base + CMDOUTCNT); //In most cases, this should be 0
> >    unpause_sequencer(p, FALSE);  // Do this before checking qoutcnt again
> >  }                               // so we might catch a blocked command
> >  outb(CLRCMDINT, p->base + CLRINT);
> >  interrupt_cleared++;
> >  qoutcnt = inb(p->base + QOUTCNT) & p->qcntmask;
> >}
> 
> This won't work.  qoutcnt may have stale information which then gets 
> transfered to CMDOUTCNT.  This is because you don't know where the
> sequencer is at the time you pause it.  Using the sequencer spin lock
> code I wrote as an example (did you incorperate this exact bit of code???):
> 
> complete:
>         /* Post the SCB and issue an interrupt */
> .if ( SCB_PAGING )
>         /*
>          * Spin loop until there is space
>          * in the QOUTFIFO.
>          */
>         mov     A, FIFODEPTH;
>         cmp     CMDOUTCNT, A    je .;
>         inc     CMDOUTCNT;
> .endif
>         mov     QOUTFIFO,SCB_TAG; <----
>         mvi     INTSTAT,CMDCMPLT;

Wouldn't matter.  If we did pause things here, then when we unpaused them, 
the QOUTCNT register would get incremented as we are writing CLRCMDINT to 
CLRINT, then we would check QOUTCNT again, it would be non-zero, so we would 
re-run the loop, and we would re-write the CMDOUTCNT variable again.

> Above and beyond this, the code you wrote is inefficient.  If you have
> good interrupt latency, you will pause the sequencer on every command
> completion.  If you use the algorithm I mentioned initially, pause and
> clear the CMDOUTCNT value every fifodepth completions, you remove this
> race and also pause the sequencer as little as possible.

OK...look at this:

news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15.
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 17. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 15. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 
news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. 

Now, tell me that we don't have high interrupt latency and that the 
efficiency of that code is as bad as one might think.  That code would be 
horrible if we ran the loop only once at a time, but to be quite honest, I 
don't really think we run that loop in a single iteration fasion very often 
at all.  These messages are from one 24 hour period on one machine, and 
about 2 hours of that 24 hour period the news server (innd) was not running, 
so that's two hours for free.

As I explained to Dan a few days ago in a private email, when I was messing 
around with using a bottom half completion routine, I ran into two problems. 
 All of the bottom half and task queues are either run based upon the 
scheduler, which we *can't* base our completion upon or we risk a deadlock 
when the scheduler is blocked for a swap operation, or they are based on the 
timer interrupt.  The timer interrupt based completion routine had horrible 
performance for char reads, namely because each and every read is small and 
done sequentially, so the added overhead of waiting for a timer interrupt to 
do completion processing was a killer.  Now, in the standard isr routine, we 
leave interrupts disabled the whole time, including during our completion 
processing.

The interrupt routine that produced the messages above was modified, it 
enables interrupts during the completion processing.  Our isr won't get 
called re-entrantly due to the kernel irq mechanism, but it does allow other 
interrupts to run during completion processing (so things like mouse 
movement in X won't be so jerky during heavy load).  The result of that, is 
that our interrupt latency can actually get worse as our completion 
processing may suffer intermittent interrupts, but we are generally speaking 
being friendlier to the system.  It's a tradeoff, we give ourselves, with 
the spin lock in place, a little more latency since we already happen to 
have a lot, in exchange, we reduce the amount of time we run with interrupts 
off.

The second reason I wrote it that way is because of this.  Let's say your 
code answers an interrupt with two commands on the QOUTFIFO, and p->
cmdoutcnt == 12, then cmdoutcnt will get incremented to 14 while the 
QOUTFIFO goes to zero.  Now, if the next interrupt has a high latency, then 
you may end up using that spin lock far before you ever reach the QOUTFIFO 
depth since you didn't update the CMDOUTCNT variable during the last isr.  
So, which is more inneficient, allowing a high latency interrupt to block 
with only a command or two complete, or writing out the actual CMDOUTCNT on 
each interrupt routine when we are already writing to the card?  Keep in 
mind the interrupt latency that we see sometimes.  Also, who's to say the 
reason you don't see messages about the QOUTCNT isn't due to this very 
condition instead of interrupt latency?  A better test to see if this 
algorithm does what you want would be not to check and print a message about 
the QOUTFIFO depth, but check to see if your sequencer is spin locking on 
CMDOUTCNT and holding up the bus.

>  The race
> is removed as you know that the sequencer is either spinning waiting
> for the count to drop or executing code before the test.  In general,
> your interrupt latency should be low, and you have the time between two
> command completions to clear the count and avoid the spin lock.  In my
> tests here under FreeBSD, I've never hit the spin lock, but I've also
> done quite a bit with the CAM SCSI code to ensure low interrupt latency.
> 
> Here's the FreeBSD code:
> 
>         while ((qoutcnt = (ahc_inb(ahc, QOUTCNT) & ahc->qcntmask)) != 0) {
>                 ahc->cmdoutcnt += qoutcnt;
>                 for (; qoutcnt > 0; qoutcnt--) {
>                         scb_index = ahc_inb(ahc, QOUTFIFO);
>                         scb = ahc->scb_data->scbarray[scb_index];
>                         if (!scb || !(scb->flags & SCB_ACTIVE)) {
>                                 printf("%s: WARNING "
>                                        "no command for scb %d "
>                                        "(cmdcmplt)\nQOUTCNT == %d\n",
>                                         ahc_name(ahc), scb_index,
>                                         qoutcnt);
>                                 continue;
>                         }
>                         STAILQ_INSERT_TAIL(&ahc->cmplete_scbs, scb,
>                                            links);
>                 }
>                 if ((ahc->flags & AHC_PAGESCBS) != 0) {
>                         if (ahc->cmdoutcnt >= ahc->qfullcount) {
>                                 /*
>                                  * Since paging only occurs on
>                                  * aic78X0 chips, we can use
>                                  * Auto Access Pause to clear
>                                  * the command count.
>                                  */
>                                 ahc_outb(ahc, CMDOUTCNT, 0);
>                                 ahc->cmdoutcnt = 0;
>                         }
>                 }
>                 while((scb = ahc->cmplete_scbs.stqh_first) != NULL) {
> 			/* Process each command */
>                        ....
> 		}
> 	}
> 
> --
> Justin T. Gibbs
> ===========================================
>   FreeBSD: Turning PCs into workstations
> ===========================================
> 

-- 
*****************************************************************************
* Doug Ledford                      *   Unix, Novell, Dos, Windows 3.x,     *
* dledford at dialnet.net    873-DIAL  *     WfW, Windows 95 & NT Technician   *
*   PPP access $14.95/month         *****************************************
*   Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
*   communities.  Sign-up online at * Web page creation and hosting, other  *
*   873-9000 V.34                   * services available, call for info.    *
*****************************************************************************