Deadlock on linux 2.6.0-test10

Richard Bass rbass at netraverse.com
Mon Nov 24 15:08:07 PST 2003


I am not sure whether this is the right mailing list to report this to,
but please excuse me if not.

There was a deadlock introduced in going from linux-2.6.0-test9 to linux-2.6.0-test10
The hardware info is as follows (when printed out by the test9 kernel):
-------------------------
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.35
         <Adaptec 19160B Ultra160 SCSI adapter>
         aic7892: Ultra160 Wide Channel A, SCSI Id=8, 32/253 SCBs

(scsi0:A:1): 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit)
(scsi0:A:4): 10.000MB/s transfers (10.000MHz, offset 16)
   Vendor: QUANTUM   Model: ATLAS_V_18_WLS    Rev: 0230
   Type:   Direct-Access                      ANSI SCSI revision: 03
scsi0:A:1:0: Tagged Queuing enabled.  Depth 253
   Vendor: TOSHIBA   Model: CD-ROM XM-6401TA  Rev: 1009
   Type:   CD-ROM                             ANSI SCSI revision: 02
SCSI device sda: 35861388 512-byte hdwr sectors (18361 MB)
SCSI device sda: drive cache: write back
  sda: sda1 sda2
--------------------------

Now, the change that got made was in the generic Linux code, but
after doing a little hunting around, it looks like maybe the problem
is in the Adaptec driver.  In any event, here is the traceback:

ahc_linux_register_host

   ahc_lock(ahd, &s)
      spin_lock_irqsave(&ahc->platform_data->spin_lock, *flags);
        (obtains the spinlock)

   scsi_assign_lock(host, &ahc->platform_data->spin_lock);
      (assigns ahc->platform_data->spin_lock to shost->host_lock

   ahc_linux_initialize_scsi_bus
       ahc_reset_channel(ahc, 'A', /*initiate_reset*/TRUE);
           ahc_send_async
             scsi_report_device_reset
                 shost_for_each_device(sdev, shost)
                     __scsi_iterate_devices
                     spin_lock_irqsave(shost->host_lock, flags);
                         ^^^
                        DEADLOCK


The change was that shost_for_each_device() now uses __scsi_iterate_devices()
which goes and gets the host_lock spinlock.  It kind of looked like you
shouldn't call ahc_send_async() with a lock, but I could be wrong here.
The problem may be in the generic stuff.  If so, I am sure the aic7xxx
maintainer can better explain what is going wrong there.

Hope this helps,

Richard <rwb>

-- 
Richard W. Bass
Systems Software Architect
NeTraverse, Inc.



More information about the aic7xxx mailing list