Hang on boot in isp with QLA2342 after upgrading to 6.3

Graham Allan allan at physics.umn.edu
Mon May 12 17:14:06 UTC 2008


On Mon, May 12, 2008 at 12:19:49PM -0400, Alexander Sack wrote:
> 
> Graham, from the driver error messages it seems that the card believes
> you are on a switched fabric and that it most likely is logging into
> the SNS server to lookup names/addresses for your devices.  Are you
> sure that your switched fabric is setup correctly?  I missed part of
> this thread so I apologize if this topic has already been hashed out.
> If for some reason the host can not log into the SNS server and
> retrieve entries from the database, then you are going to be hosed (I
> agree the OS shouldn't be hung unless you are booting off the disk
> connected to the failed controller, etc.).
> 
> I am very familiar with the ISP23/4xx chipset and I go digging more
> but I was wondering if you have verified that your topology is valid.

I'm happy to confess to being a SAN novice, so I'm not quite sure how I
would verify that, other than that it "seems to work" ok on the older
OS release, and also in specific circumstances on the current one - for
example, if one port of the HBA is connected directly to a device, and
the other to the fabric, it doesn't have a problem - so in that
situation it is able to log in to the fabric ok and retrieve database
information.

Even when it does hang, it does appear to have logged in to the fabric
ok, according to my interpretation of the switch output:

fcswitch_s43_2:admin> portshow 8
portName:
portHealth: No License
Authentication: None
portFlags:  0x223805b   portLbMod:  0x0  PRESENT ACTIVE F_PORT G_PORT U_PORT LOGIN NOELP LED ACCEPT WAS_EPORT
portType:   4.1
portState:  1   Online
portPhys:   6   In_Sync
portScn:    6   F_Port
portRegs:   0x81100000
portData:   0x11deb230
portId:     031800
portWwn:    20:08:00:60:69:51:4a:20
portWwn of device(s) connected:         21:00:00:e0:8b:08:06:d2
Distance:   normal
Speed:      N2Gbps

Interrupts:        20487      Link_failure: 18         Frjt:         0
Unknown:           404        Loss_of_sync: 12295      Fbsy:         0
Lli:               13715      Loss_of_sig:  93
Proc_rqrd:         6646       Protocol_err: 0
Timed_out:         0          Invalid_word: 0
Rx_flushed:        0          Invalid_crc:  0
Tx_unavail:        0          Delim_err:    0
Free_buffer:       0          Address_err:  0
Overrun:           0          Lr_in:        36
Suspended:         0          Lr_out:       73
Parity_err:        0          Ols_in:       73

and it's listed in the switch name server (third entry down, 031800):

fcswitch_s43_2:admin> nsshow
{
 Type Pid    COS     PortName                NodeName                 TTL(sec)
 N    031300;      3;21:00:00:04:d9:60:17:6e;20:00:00:04:d9:60:17:6d; na
    FC4s: FCP
    PortSymb: [39] "UNKNOWN A.0 UNKNOWN FW:01.02 Port 1    "
    Fabric Port Name: 20:03:00:60:69:51:4a:20
 N    031500;      3;21:00:00:1b:4d:00:83:ed;20:00:00:1b:4d:00:83:ec; na
    FC4s: FCP [JetStor FreeBSD mark R4 R001]
    Fabric Port Name: 20:05:00:60:69:51:4a:20
 N    031800;      3;21:00:00:e0:8b:08:06:d2;20:00:00:e0:8b:08:06:d2; na
    FC4s: FCP
    Fabric Port Name: 20:08:00:60:69:51:4a:20
 N    031900;      3;10:00:00:06:2b:09:4f:d8;20:00:00:06:2b:09:4f:d8; na
    FC4s: FCIP FCP
    PortSymb: [47] "LSI7202P B.0 03-01001-02A FW:1.00.06 Port 0    "
    Fabric Port Name: 20:09:00:60:69:51:4a:20
 N    031a00;    2,3;10:00:00:00:c9:24:5b:04;20:00:00:00:c9:24:5b:04; na
    FC4s: FCP
    PortSymb: [49] "UNIX (emx2) KGPSA-CA S/W Rev 2.25: F/W Rev 3.93a0"
    Fabric Port Name: 20:0a:00:60:69:51:4a:20
The Local Name Server has 5 entries }

It has been pointed out to me that this kind of weird interaction isn't
exactly unknown in the SAN world, and setting up zoning on the switch
would probably make it go away. So I will also try that (it's probably
a giveway of a SAN novice that I hadn't already done so - it certainly
does sound like it would help). But if the hang does point to a problem
in the driver, I'm also happy to keep trying different things in the
hope of revealing where the problem actually lies.

Graham


More information about the freebsd-scsi mailing list