Command queuing in Rev 7.0?

Wed Aug 15 08:54:17 PDT 2007

Steve Schlosser wrote, On 08/15/2007 09:53 AM:
> Thanks for the sanity checks.  Unfortunately, it seems that I'm still
> stuck.  Please find point-by-point responses embedded below.
> 
> I'm going to try and rule out the benchmark.  I've got another one
> that works using SG rather than file IO.
> 
> Thanks again.
> 
> -steve
> 
> On 8/15/07, Todd Denniston <Todd.Denniston at ssa.crane.navy.mil> wrote:
>> I don't have any bright light I can shed but, I think it would be good to make
>> sure that some assumptions I would make are met.
>>
>> 1) User, Goal and Curr lines[1] match between the two machines for the desired
>> drives while the benchmark is running.
> 
> Yes, these match on both machines.
> 
>> 2) the "Serial EEPROM:" data[1] matches between the two machines (mine differ,
>> I believe, because on one machine the bus is locked at 33MHz and the other is
>> at 8MHz).  Probably best to visual diff the settings of the machines after
>> doing the Ctrl-A to get the card bios at boot.
> 
> Again, these match on each machine.
> 
>> 3) while the benchmark is running do you ever see the "Commands Active"
>> line[1] go above 1?
> 
> Aha!  While the benchmark is running on the machine with the 2.4
> kernel, "Commands Active" is always equal to the max queue depth I
> set.  However, on the 2.6 kernel, it is always equal to 1, regardless
> of the max queue depth value (i.e., "Max Tagged Openings").  Again, it
> looks like the 2.6 machine is never queuing multiple requests to the
> disk.
> 
>> 4) both machines are running uniprocessor, or both smp?
> 
> The 2.4 machine is uniprocessor and the 2.6 machine is smp.  I haven't
> had a chance to match up the machines yet, but I can.
> 

I would suggest, just to rule out some weird SMP bug/BKL leftover, either boot 
the smp machine with a uniprocessor kernel or pass the bootparam
nosmp
or
maxcpus=0
http://kerneltrap.org/man/linux/man7/bootparam.7

http://linux.about.com/library/cmd/blcmdl7_bootparam.htm

>> 5) during boot|insmod dmesg&syslog for both systems show similar scsi messages
>> for how fast they are going to run the bus and how both bus and device were
>> detected?
> 
> Yes, they both report 160MB/s.  The other dmesg entries look the same as well.
> 
>> 6) either during boot or while the benchmark is running you do not see scsi
>> kernel errors/warnings?
> 
> Nope.  No error messages while benchmarks are running, either.
> 
>> 7) can you or have you swapped cards & drives between machines to make sure
>> the problem does not follow hardware[2]?
>>
> I have swapped drives and cards around and have seen consistent
> behavior.  I'm confident that the difference is software, not
> hardware.
> 
>>
>> [1] from /proc/scsi/aic7xxx/<n>
>> [2] it happens with 'identical' hardware. The reason my buses are set
>> different is that with 'identical' hardware on both, one can  be driven for
>> months at 33MHz, while the other locks up the system in under 3 days if it is
>> running faster than 8MHz. From swapping, I know it to be a drive problem.
>>
>> Steve Schlosser wrote, On 08/14/2007 08:41 PM:
>>> Can anyone shed some light on our command queuing problems, described
>>> below?  I posted this a week or so ago and haven't heard anything.
>>> Thanks!
>>>
>>> -steve
>>>
>>> ---------- Forwarded message ----------
>>> From: Steve Schlosser <swschlosser at gmail.com>
>>> Date: Aug 3, 2007 12:35 AM
>>> Subject: Command queuing in Rev 7.0?
>>> To: aic7xxx at freebsd.org
>>>
>>>
>>> Hello
>>>
>>> I have been doing some experiments with command queuing, and I'm
>>> having trouble confirming that my system is actually queuing requests
>>> at the disk.
>>>
>>> Here is my setup.  I have two machines, an "old" one and a "new" one,
>>> each with an Adaptec 29160 hooked up to identical Seagate Cheetah10k7
>>> disks.  The old system is running Debian, kernel version 2.4.27, and
>>> dmesg reports that the aic7xxx driver Rev 6.2.36 is running.  The new
>>> system is running Ubuntu 7.04, kernel version 2.6.20.3, and aic7xxx
>>> Rev 7.0.
>>>
>>> I control the queue depth by setting global_tag_depth when I load the
>>> module.  I'm running a simple microbenchmark which issues random 4KB
>>> reads to the disk, varying the number of concurrent requests
>>> outstanding at the disk from 1 (no queuing) to 253 (the maximum value
>>> allowed for global_tag_depth).  In both cases, dmesg and
>>> /proc/scsi/aic7xxx/<n> both report the queue depth that I set when I
>>> load the module.
>>>
>>> On the old system, bandwidth increases as I increase queue depth,
>>> presumably because the disk has more scheduling choices.  Bandwidth
>>> scales from 0.7MB/s for one outstanding request to 2.0MB/s for 128
>>> outstanding requests.
>>>
>>> However, with the new system, I don't get the same increase in
>>> bandwidth - it stays at 0.7MB/s regardless of the queue depth setting.
>>>  This suggests to me that requests are not getting queued at the disk.
>>>
>>> Any ideas why the newer driver might not be queuing requests?  Is
>>> there another layer in the driver stack that I should be checking on?
>>>
>>> Thanks.
>>>
>>> -steve
>>

-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter