Hang on boot in isp with QLA2342 after upgrading to 6.3

Scott Long scottl at samsco.org
Thu May 15 23:58:26 UTC 2008


Graham Allan wrote:
> Graham Allan wrote:
>> On Mon, May 12, 2008 at 12:14:04PM -0500, Graham Allan wrote:
>>> It has been pointed out to me that this kind of weird interaction isn't
>>> exactly unknown in the SAN world, and setting up zoning on the switch
>>> would probably make it go away. So I will also try that (it's probably
>>> a giveway of a SAN novice that I hadn't already done so - it certainly
>>> does sound like it would help). But if the hang does point to a problem
>>> in the driver, I'm also happy to keep trying different things in the
>>> hope of revealing where the problem actually lies.
>>
>> Replying to my own message here.
>>
>> The good news for me is that setting up zoning in the switch does fix
>> (or at least hide) the problem on this server for me.
>>
>> The bad news is, I believe I'm seeing a similar kind of behaviour on a
>> completely different 6.3 setup. Haven't had time to fully characterise
>> it yet, but in short... Dell 1950 with QLA2342, connected directly to
>> an EMC CX300 array. Very often (lets say unpredictably 50% of time)
>> hangs during boot at exactly the same point as the first system, right
>> around the time it would be probing for drives.
> 
> So I guess one thing I could do is build a kernal with debugging support 
> (and possibly the "deadlock recipe" from the freebsd handbook), and 
> force it to the debugger when it hangs. Then I could at least get some 
> tracebacks and other information - though as it never actually panics 
> I'm not sure how useful the information will be - I guess it's likely 
> stuck in a loop somehow. It should give some clue.
> 
> Does that sound like a reasonable idea? Does the kernel version matter 
> (eg standard 6.3 vs RELENG_6)? Is this list the most appropriate place 
> for me to talk about the issue?
> 
> (I also think I should double-check 6.2 again, as its release notes 
> indicate it was where isp was synced from CURRENT - I'd think it should 
> have the same issue).
> 
> Thanks for everyones interest,
> 
> Graham

Well, is it actually deadlocking, or just holding up the boot while it 
tries to individually probe many thousands of target and lun ID's?  I'd
bet it's the latter.  Compiling in the debugger is the correct first 
step.  You can then compile in CAMDEBUG, CAM_DEBUG_LUN=-1, and 
CAM_DEBUG_FLAGS=CAM_DEBUG_INFO.

Scott


More information about the freebsd-scsi mailing list