Command queuing in Rev 7.0?

Wed Aug 15 14:53:27 UTC 2007

Thanks for the sanity checks.  Unfortunately, it seems that I'm still
stuck.  Please find point-by-point responses embedded below.

I'm going to try and rule out the benchmark.  I've got another one
that works using SG rather than file IO.

Thanks again.

-steve

On 8/15/07, Todd Denniston <Todd.Denniston at ssa.crane.navy.mil> wrote:
> I don't have any bright light I can shed but, I think it would be good to make
> sure that some assumptions I would make are met.
>
> 1) User, Goal and Curr lines[1] match between the two machines for the desired
> drives while the benchmark is running.

Yes, these match on both machines.

> 2) the "Serial EEPROM:" data[1] matches between the two machines (mine differ,
> I believe, because on one machine the bus is locked at 33MHz and the other is
> at 8MHz).  Probably best to visual diff the settings of the machines after
> doing the Ctrl-A to get the card bios at boot.

Again, these match on each machine.

> 3) while the benchmark is running do you ever see the "Commands Active"
> line[1] go above 1?

Aha!  While the benchmark is running on the machine with the 2.4
kernel, "Commands Active" is always equal to the max queue depth I
set.  However, on the 2.6 kernel, it is always equal to 1, regardless
of the max queue depth value (i.e., "Max Tagged Openings").  Again, it
looks like the 2.6 machine is never queuing multiple requests to the
disk.

> 4) both machines are running uniprocessor, or both smp?

The 2.4 machine is uniprocessor and the 2.6 machine is smp.  I haven't
had a chance to match up the machines yet, but I can.

> 5) during boot|insmod dmesg&syslog for both systems show similar scsi messages
> for how fast they are going to run the bus and how both bus and device were
> detected?

Yes, they both report 160MB/s.  The other dmesg entries look the same as well.

> 6) either during boot or while the benchmark is running you do not see scsi
> kernel errors/warnings?

Nope.  No error messages while benchmarks are running, either.

> 7) can you or have you swapped cards & drives between machines to make sure
> the problem does not follow hardware[2]?
>
I have swapped drives and cards around and have seen consistent
behavior.  I'm confident that the difference is software, not
hardware.

>
>
> [1] from /proc/scsi/aic7xxx/<n>
> [2] it happens with 'identical' hardware. The reason my buses are set
> different is that with 'identical' hardware on both, one can  be driven for
> months at 33MHz, while the other locks up the system in under 3 days if it is
> running faster than 8MHz. From swapping, I know it to be a drive problem.
>
> Steve Schlosser wrote, On 08/14/2007 08:41 PM:
> > Can anyone shed some light on our command queuing problems, described
> > below?  I posted this a week or so ago and haven't heard anything.
> > Thanks!
> >
> > -steve
> >
> > ---------- Forwarded message ----------
> > From: Steve Schlosser <swschlosser at gmail.com>
> > Date: Aug 3, 2007 12:35 AM
> > Subject: Command queuing in Rev 7.0?
> > To: aic7xxx at freebsd.org
> >
> >
> > Hello
> >
> > I have been doing some experiments with command queuing, and I'm
> > having trouble confirming that my system is actually queuing requests
> > at the disk.
> >
> > Here is my setup.  I have two machines, an "old" one and a "new" one,
> > each with an Adaptec 29160 hooked up to identical Seagate Cheetah10k7
> > disks.  The old system is running Debian, kernel version 2.4.27, and
> > dmesg reports that the aic7xxx driver Rev 6.2.36 is running.  The new
> > system is running Ubuntu 7.04, kernel version 2.6.20.3, and aic7xxx
> > Rev 7.0.
> >
> > I control the queue depth by setting global_tag_depth when I load the
> > module.  I'm running a simple microbenchmark which issues random 4KB
> > reads to the disk, varying the number of concurrent requests
> > outstanding at the disk from 1 (no queuing) to 253 (the maximum value
> > allowed for global_tag_depth).  In both cases, dmesg and
> > /proc/scsi/aic7xxx/<n> both report the queue depth that I set when I
> > load the module.
> >
> > On the old system, bandwidth increases as I increase queue depth,
> > presumably because the disk has more scheduling choices.  Bandwidth
> > scales from 0.7MB/s for one outstanding request to 2.0MB/s for 128
> > outstanding requests.
> >
> > However, with the new system, I don't get the same increase in
> > bandwidth - it stays at 0.7MB/s regardless of the queue depth setting.
> >  This suggests to me that requests are not getting queued at the disk.
> >
> > Any ideas why the newer driver might not be queuing requests?  Is
> > there another layer in the driver stack that I should be checking on?
> >
> > Thanks.
> >
> > -steve
>
>
> --
> Todd Denniston
> Crane Division, Naval Surface Warfare Center (NSWC Crane)
> Harnessing the Power of Technology for the Warfighter
>