From swschlosser at gmail.com Fri Aug 3 05:00:55 2007 From: swschlosser at gmail.com (Steve Schlosser) Date: Fri Aug 3 05:00:58 2007 Subject: Command queuing in Rev 7.0? Message-ID: <4d362c350708022135y1f7e9abbx428ae12785cfbe45@mail.gmail.com> Hello I have been doing some experiments with command queuing, and I'm having trouble confirming that my system is actually queuing requests at the disk. Here is my setup. I have two machines, an "old" one and a "new" one, each with an Adaptec 29160 hooked up to identical Seagate Cheetah10k7 disks. The old system is running Debian, kernel version 2.4.27, and dmesg reports that the aic7xxx driver Rev 6.2.36 is running. The new system is running Ubuntu 7.04, kernel version 2.6.20.3, and aic7xxx Rev 7.0. I control the queue depth by setting global_tag_depth when I load the module. I'm running a simple microbenchmark which issues random 4KB reads to the disk, varying the number of concurrent requests outstanding at the disk from 1 (no queuing) to 253 (the maximum value allowed for global_tag_depth). In both cases, dmesg and /proc/scsi/aic7xxx/ both report the queue depth that I set when I load the module. On the old system, bandwidth increases as I increase queue depth, presumably because the disk has more scheduling choices. Bandwidth scales from 0.7MB/s for one outstanding request to 2.0MB/s for 128 outstanding requests. However, with the new system, I don't get the same increase in bandwidth - it stays at 0.7MB/s regardless of the queue depth setting. This suggests to me that requests are not getting queued at the disk. Any ideas why the newer driver might not be queuing requests? Is there another layer in the driver stack that I should be checking on? Thanks. -steve From swschlosser at gmail.com Wed Aug 15 01:41:37 2007 From: swschlosser at gmail.com (Steve Schlosser) Date: Wed Aug 15 01:41:39 2007 Subject: Command queuing in Rev 7.0? In-Reply-To: <4d362c350708022135y1f7e9abbx428ae12785cfbe45@mail.gmail.com> References: <4d362c350708022135y1f7e9abbx428ae12785cfbe45@mail.gmail.com> Message-ID: <4d362c350708141841w44ed1e2bo27e5adf13b62e422@mail.gmail.com> Can anyone shed some light on our command queuing problems, described below? I posted this a week or so ago and haven't heard anything. Thanks! -steve ---------- Forwarded message ---------- From: Steve Schlosser Date: Aug 3, 2007 12:35 AM Subject: Command queuing in Rev 7.0? To: aic7xxx@freebsd.org Hello I have been doing some experiments with command queuing, and I'm having trouble confirming that my system is actually queuing requests at the disk. Here is my setup. I have two machines, an "old" one and a "new" one, each with an Adaptec 29160 hooked up to identical Seagate Cheetah10k7 disks. The old system is running Debian, kernel version 2.4.27, and dmesg reports that the aic7xxx driver Rev 6.2.36 is running. The new system is running Ubuntu 7.04, kernel version 2.6.20.3, and aic7xxx Rev 7.0. I control the queue depth by setting global_tag_depth when I load the module. I'm running a simple microbenchmark which issues random 4KB reads to the disk, varying the number of concurrent requests outstanding at the disk from 1 (no queuing) to 253 (the maximum value allowed for global_tag_depth). In both cases, dmesg and /proc/scsi/aic7xxx/ both report the queue depth that I set when I load the module. On the old system, bandwidth increases as I increase queue depth, presumably because the disk has more scheduling choices. Bandwidth scales from 0.7MB/s for one outstanding request to 2.0MB/s for 128 outstanding requests. However, with the new system, I don't get the same increase in bandwidth - it stays at 0.7MB/s regardless of the queue depth setting. This suggests to me that requests are not getting queued at the disk. Any ideas why the newer driver might not be queuing requests? Is there another layer in the driver stack that I should be checking on? Thanks. -steve From Todd.Denniston at ssa.crane.navy.mil Wed Aug 15 13:41:12 2007 From: Todd.Denniston at ssa.crane.navy.mil (Todd Denniston) Date: Wed Aug 15 13:41:15 2007 Subject: Command queuing in Rev 7.0? In-Reply-To: <4d362c350708141841w44ed1e2bo27e5adf13b62e422@mail.gmail.com> References: <4d362c350708022135y1f7e9abbx428ae12785cfbe45@mail.gmail.com> <4d362c350708141841w44ed1e2bo27e5adf13b62e422@mail.gmail.com> Message-ID: <46C2FFC7.4060403@ssa.crane.navy.mil> I don't have any bright light I can shed but, I think it would be good to make sure that some assumptions I would make are met. 1) User, Goal and Curr lines[1] match between the two machines for the desired drives while the benchmark is running. 2) the "Serial EEPROM:" data[1] matches between the two machines (mine differ, I believe, because on one machine the bus is locked at 33MHz and the other is at 8MHz). Probably best to visual diff the settings of the machines after doing the Ctrl-A to get the card bios at boot. 3) while the benchmark is running do you ever see the "Commands Active" line[1] go above 1? 4) both machines are running uniprocessor, or both smp? 5) during boot|insmod dmesg&syslog for both systems show similar scsi messages for how fast they are going to run the bus and how both bus and device were detected? 6) either during boot or while the benchmark is running you do not see scsi kernel errors/warnings? 7) can you or have you swapped cards & drives between machines to make sure the problem does not follow hardware[2]? [1] from /proc/scsi/aic7xxx/ [2] it happens with 'identical' hardware. The reason my buses are set different is that with 'identical' hardware on both, one can be driven for months at 33MHz, while the other locks up the system in under 3 days if it is running faster than 8MHz. From swapping, I know it to be a drive problem. Steve Schlosser wrote, On 08/14/2007 08:41 PM: > Can anyone shed some light on our command queuing problems, described > below? I posted this a week or so ago and haven't heard anything. > Thanks! > > -steve > > ---------- Forwarded message ---------- > From: Steve Schlosser > Date: Aug 3, 2007 12:35 AM > Subject: Command queuing in Rev 7.0? > To: aic7xxx@freebsd.org > > > Hello > > I have been doing some experiments with command queuing, and I'm > having trouble confirming that my system is actually queuing requests > at the disk. > > Here is my setup. I have two machines, an "old" one and a "new" one, > each with an Adaptec 29160 hooked up to identical Seagate Cheetah10k7 > disks. The old system is running Debian, kernel version 2.4.27, and > dmesg reports that the aic7xxx driver Rev 6.2.36 is running. The new > system is running Ubuntu 7.04, kernel version 2.6.20.3, and aic7xxx > Rev 7.0. > > I control the queue depth by setting global_tag_depth when I load the > module. I'm running a simple microbenchmark which issues random 4KB > reads to the disk, varying the number of concurrent requests > outstanding at the disk from 1 (no queuing) to 253 (the maximum value > allowed for global_tag_depth). In both cases, dmesg and > /proc/scsi/aic7xxx/ both report the queue depth that I set when I > load the module. > > On the old system, bandwidth increases as I increase queue depth, > presumably because the disk has more scheduling choices. Bandwidth > scales from 0.7MB/s for one outstanding request to 2.0MB/s for 128 > outstanding requests. > > However, with the new system, I don't get the same increase in > bandwidth - it stays at 0.7MB/s regardless of the queue depth setting. > This suggests to me that requests are not getting queued at the disk. > > Any ideas why the newer driver might not be queuing requests? Is > there another layer in the driver stack that I should be checking on? > > Thanks. > > -steve -- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter From swschlosser at gmail.com Wed Aug 15 14:53:27 2007 From: swschlosser at gmail.com (Steve Schlosser) Date: Wed Aug 15 14:53:30 2007 Subject: Command queuing in Rev 7.0? In-Reply-To: <46C2FFC7.4060403@ssa.crane.navy.mil> References: <4d362c350708022135y1f7e9abbx428ae12785cfbe45@mail.gmail.com> <4d362c350708141841w44ed1e2bo27e5adf13b62e422@mail.gmail.com> <46C2FFC7.4060403@ssa.crane.navy.mil> Message-ID: <4d362c350708150753j68fa3755p9efa857e3b000a85@mail.gmail.com> Thanks for the sanity checks. Unfortunately, it seems that I'm still stuck. Please find point-by-point responses embedded below. I'm going to try and rule out the benchmark. I've got another one that works using SG rather than file IO. Thanks again. -steve On 8/15/07, Todd Denniston wrote: > I don't have any bright light I can shed but, I think it would be good to make > sure that some assumptions I would make are met. > > 1) User, Goal and Curr lines[1] match between the two machines for the desired > drives while the benchmark is running. Yes, these match on both machines. > 2) the "Serial EEPROM:" data[1] matches between the two machines (mine differ, > I believe, because on one machine the bus is locked at 33MHz and the other is > at 8MHz). Probably best to visual diff the settings of the machines after > doing the Ctrl-A to get the card bios at boot. Again, these match on each machine. > 3) while the benchmark is running do you ever see the "Commands Active" > line[1] go above 1? Aha! While the benchmark is running on the machine with the 2.4 kernel, "Commands Active" is always equal to the max queue depth I set. However, on the 2.6 kernel, it is always equal to 1, regardless of the max queue depth value (i.e., "Max Tagged Openings"). Again, it looks like the 2.6 machine is never queuing multiple requests to the disk. > 4) both machines are running uniprocessor, or both smp? The 2.4 machine is uniprocessor and the 2.6 machine is smp. I haven't had a chance to match up the machines yet, but I can. > 5) during boot|insmod dmesg&syslog for both systems show similar scsi messages > for how fast they are going to run the bus and how both bus and device were > detected? Yes, they both report 160MB/s. The other dmesg entries look the same as well. > 6) either during boot or while the benchmark is running you do not see scsi > kernel errors/warnings? Nope. No error messages while benchmarks are running, either. > 7) can you or have you swapped cards & drives between machines to make sure > the problem does not follow hardware[2]? > I have swapped drives and cards around and have seen consistent behavior. I'm confident that the difference is software, not hardware. > > > [1] from /proc/scsi/aic7xxx/ > [2] it happens with 'identical' hardware. The reason my buses are set > different is that with 'identical' hardware on both, one can be driven for > months at 33MHz, while the other locks up the system in under 3 days if it is > running faster than 8MHz. From swapping, I know it to be a drive problem. > > Steve Schlosser wrote, On 08/14/2007 08:41 PM: > > Can anyone shed some light on our command queuing problems, described > > below? I posted this a week or so ago and haven't heard anything. > > Thanks! > > > > -steve > > > > ---------- Forwarded message ---------- > > From: Steve Schlosser > > Date: Aug 3, 2007 12:35 AM > > Subject: Command queuing in Rev 7.0? > > To: aic7xxx@freebsd.org > > > > > > Hello > > > > I have been doing some experiments with command queuing, and I'm > > having trouble confirming that my system is actually queuing requests > > at the disk. > > > > Here is my setup. I have two machines, an "old" one and a "new" one, > > each with an Adaptec 29160 hooked up to identical Seagate Cheetah10k7 > > disks. The old system is running Debian, kernel version 2.4.27, and > > dmesg reports that the aic7xxx driver Rev 6.2.36 is running. The new > > system is running Ubuntu 7.04, kernel version 2.6.20.3, and aic7xxx > > Rev 7.0. > > > > I control the queue depth by setting global_tag_depth when I load the > > module. I'm running a simple microbenchmark which issues random 4KB > > reads to the disk, varying the number of concurrent requests > > outstanding at the disk from 1 (no queuing) to 253 (the maximum value > > allowed for global_tag_depth). In both cases, dmesg and > > /proc/scsi/aic7xxx/ both report the queue depth that I set when I > > load the module. > > > > On the old system, bandwidth increases as I increase queue depth, > > presumably because the disk has more scheduling choices. Bandwidth > > scales from 0.7MB/s for one outstanding request to 2.0MB/s for 128 > > outstanding requests. > > > > However, with the new system, I don't get the same increase in > > bandwidth - it stays at 0.7MB/s regardless of the queue depth setting. > > This suggests to me that requests are not getting queued at the disk. > > > > Any ideas why the newer driver might not be queuing requests? Is > > there another layer in the driver stack that I should be checking on? > > > > Thanks. > > > > -steve > > > -- > Todd Denniston > Crane Division, Naval Surface Warfare Center (NSWC Crane) > Harnessing the Power of Technology for the Warfighter > From Todd.Denniston at ssa.crane.navy.mil Wed Aug 15 15:54:17 2007 From: Todd.Denniston at ssa.crane.navy.mil (Todd Denniston) Date: Wed Aug 15 15:54:21 2007 Subject: Command queuing in Rev 7.0? In-Reply-To: <4d362c350708150753j68fa3755p9efa857e3b000a85@mail.gmail.com> References: <4d362c350708022135y1f7e9abbx428ae12785cfbe45@mail.gmail.com> <4d362c350708141841w44ed1e2bo27e5adf13b62e422@mail.gmail.com> <46C2FFC7.4060403@ssa.crane.navy.mil> <4d362c350708150753j68fa3755p9efa857e3b000a85@mail.gmail.com> Message-ID: <46C3219B.40103@ssa.crane.navy.mil> Steve Schlosser wrote, On 08/15/2007 09:53 AM: > Thanks for the sanity checks. Unfortunately, it seems that I'm still > stuck. Please find point-by-point responses embedded below. > > I'm going to try and rule out the benchmark. I've got another one > that works using SG rather than file IO. > > Thanks again. > > -steve > > On 8/15/07, Todd Denniston wrote: >> I don't have any bright light I can shed but, I think it would be good to make >> sure that some assumptions I would make are met. >> >> 1) User, Goal and Curr lines[1] match between the two machines for the desired >> drives while the benchmark is running. > > Yes, these match on both machines. > >> 2) the "Serial EEPROM:" data[1] matches between the two machines (mine differ, >> I believe, because on one machine the bus is locked at 33MHz and the other is >> at 8MHz). Probably best to visual diff the settings of the machines after >> doing the Ctrl-A to get the card bios at boot. > > Again, these match on each machine. > >> 3) while the benchmark is running do you ever see the "Commands Active" >> line[1] go above 1? > > Aha! While the benchmark is running on the machine with the 2.4 > kernel, "Commands Active" is always equal to the max queue depth I > set. However, on the 2.6 kernel, it is always equal to 1, regardless > of the max queue depth value (i.e., "Max Tagged Openings"). Again, it > looks like the 2.6 machine is never queuing multiple requests to the > disk. > >> 4) both machines are running uniprocessor, or both smp? > > The 2.4 machine is uniprocessor and the 2.6 machine is smp. I haven't > had a chance to match up the machines yet, but I can. > I would suggest, just to rule out some weird SMP bug/BKL leftover, either boot the smp machine with a uniprocessor kernel or pass the bootparam nosmp or maxcpus=0 http://kerneltrap.org/man/linux/man7/bootparam.7 http://linux.about.com/library/cmd/blcmdl7_bootparam.htm >> 5) during boot|insmod dmesg&syslog for both systems show similar scsi messages >> for how fast they are going to run the bus and how both bus and device were >> detected? > > Yes, they both report 160MB/s. The other dmesg entries look the same as well. > >> 6) either during boot or while the benchmark is running you do not see scsi >> kernel errors/warnings? > > Nope. No error messages while benchmarks are running, either. > >> 7) can you or have you swapped cards & drives between machines to make sure >> the problem does not follow hardware[2]? >> > I have swapped drives and cards around and have seen consistent > behavior. I'm confident that the difference is software, not > hardware. > >> >> [1] from /proc/scsi/aic7xxx/ >> [2] it happens with 'identical' hardware. The reason my buses are set >> different is that with 'identical' hardware on both, one can be driven for >> months at 33MHz, while the other locks up the system in under 3 days if it is >> running faster than 8MHz. From swapping, I know it to be a drive problem. >> >> Steve Schlosser wrote, On 08/14/2007 08:41 PM: >>> Can anyone shed some light on our command queuing problems, described >>> below? I posted this a week or so ago and haven't heard anything. >>> Thanks! >>> >>> -steve >>> >>> ---------- Forwarded message ---------- >>> From: Steve Schlosser >>> Date: Aug 3, 2007 12:35 AM >>> Subject: Command queuing in Rev 7.0? >>> To: aic7xxx@freebsd.org >>> >>> >>> Hello >>> >>> I have been doing some experiments with command queuing, and I'm >>> having trouble confirming that my system is actually queuing requests >>> at the disk. >>> >>> Here is my setup. I have two machines, an "old" one and a "new" one, >>> each with an Adaptec 29160 hooked up to identical Seagate Cheetah10k7 >>> disks. The old system is running Debian, kernel version 2.4.27, and >>> dmesg reports that the aic7xxx driver Rev 6.2.36 is running. The new >>> system is running Ubuntu 7.04, kernel version 2.6.20.3, and aic7xxx >>> Rev 7.0. >>> >>> I control the queue depth by setting global_tag_depth when I load the >>> module. I'm running a simple microbenchmark which issues random 4KB >>> reads to the disk, varying the number of concurrent requests >>> outstanding at the disk from 1 (no queuing) to 253 (the maximum value >>> allowed for global_tag_depth). In both cases, dmesg and >>> /proc/scsi/aic7xxx/ both report the queue depth that I set when I >>> load the module. >>> >>> On the old system, bandwidth increases as I increase queue depth, >>> presumably because the disk has more scheduling choices. Bandwidth >>> scales from 0.7MB/s for one outstanding request to 2.0MB/s for 128 >>> outstanding requests. >>> >>> However, with the new system, I don't get the same increase in >>> bandwidth - it stays at 0.7MB/s regardless of the queue depth setting. >>> This suggests to me that requests are not getting queued at the disk. >>> >>> Any ideas why the newer driver might not be queuing requests? Is >>> there another layer in the driver stack that I should be checking on? >>> >>> Thanks. >>> >>> -steve >> -- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter From rien at rename-it.nl Thu Aug 30 02:37:49 2007 From: rien at rename-it.nl (Rien Broekstra) Date: Thu Aug 30 02:37:52 2007 Subject: Error when using a tapestreamer: data overrun detected in Data-out phase. Message-ID: <20070830091129.GC880@sinas.rename-it.nl> Hello everyone, I'm mailing this list because i'm quite out of options (google and irc help channels haven't gotten me very far). One of my customer's machines has recently started to act weird during the nightly backup cycle. About half of the time the backup cycle terminates abnormally, and I find the following lines in dmesg: ---8<---- (scsi12:A:15:0): data overrun detected in Data-out phase. Tag == 0x3. (scsi12:A:15:0): Have seen Data Phase. Length = 49152. NumSGs = 12. sg[0] - Addr 0x0220b5000 : Length 4096 sg[1] - Addr 0x0379e6000 : Length 4096 sg[2] - Addr 0x02765c000 : Length 4096 sg[3] - Addr 0x0da05000 : Length 4096 sg[4] - Addr 0x0aecd000 : Length 4096 sg[5] - Addr 0x01c61f000 : Length 4096 sg[6] - Addr 0x02c27000 : Length 4096 sg[7] - Addr 0x0773a000 : Length 4096 sg[8] - Addr 0x0162c4000 : Length 4096 sg[9] - Addr 0x0371a7000 : Length 4096 sg[10] - Addr 0x018f1d000 : Length 4096 sg[11] - Addr 0x020f67000 : Length 4096 st0: Error 70000 (sugg. bt 0x0, driver bt 0x0, host bt 0x7). st0: Error 10000 (sugg. bt 0x0, driver bt 0x0, host bt 0x1). st0: Error on write filemark. st0: Error 10000 (sugg. bt 0x0, driver bt 0x0, host bt 0x1). st0: Error 10000 (sugg. bt 0x0, driver bt 0x0, host bt 0x1). ---8<--- After this happens, the tapestreamer keeps blinking that it's active forever. The fix is to power the streamer off and on again, and reload the st module. Details of the system this happens on: It's a Sony AIT-2 tapestreamer connected to an Adaptec aic-7892a host controller. The host is running linux 2.6.17-14mdv (Mandriva Spring 2007.1). Is this a driver issue? Do I have broken hardware or cabling somewhere? Is there something else I can do to isolate the problem? Thanks in advance for any insights. Cheers, -- Rien Broekstra Mail: rien@rename-it.nl