svn commit: r203889 - in stable/8/sys: cam cam/ata cam/scsi dev/ahci dev/asr dev/ata dev/ciss dev/hptiop dev/hptrr dev/mly dev/mpt dev/ppbus dev/siis dev/trm dev/twa dev/usb/storage

Lawrence Stewart lstewart at freebsd.org
Sun Feb 21 23:39:14 UTC 2010


On 02/20/10 04:16, Alexander Motin wrote:
> Lawrence Stewart wrote:
>> A couple of times it has gotten even more upset reporting things like this:
>>
>> mpt0: mpt_cam_event: 0x16
>> mpt0: mpt_cam_event: 0x16
>> mpt0: request 0xffffff80002f1400:54058 timed out for ccb
>> 0xffffff0001c65000 (req->ccb 0xffffff0001c65000)
>> mpt0: attempting to abort req 0xffffff80002f1400:54058 function 0
>> mpt0: request 0xffffff80002fd100:54059 timed out for ccb
>> 0xffffff009f3ec800 (req->ccb 0xffffff009f3ec800)
>> mpt0: request 0xffffff80002efcf0:54060 timed out for ccb
>> 0xffffff0001bd2000 (req->ccb 0xffffff0001bd2000)
>> mpt0: mpt_recover_commands: IOC Status 0x4a. Resetting controller.
>> mpt0: mpt_cam_event: 0x0
>> mpt0: mpt_cam_event: 0x0
>> mpt0: completing timedout/aborted req 0xffffff80002f1400:54058
>> mpt0: completing timedout/aborted req 0xffffff80002fd100:54059
>> mpt0: completing timedout/aborted req 0xffffff80002efcf0:54060
>> mpt0: mpt_cam_event: 0x16
>> mpt0: mpt_cam_event: 0x12
>> mpt0: mpt_cam_event: 0x12
>> mpt0: mpt_cam_event: 0x16
>> mpt0: Volume(0:2): Volume Status Changed
>> mpt0: request 0xffffff80002f8990:0 timed out for ccb 0xffffff009f3cb800
>> (req->ccb 0)
>>
>> No ill effects are observed after such an episode and the array remains
>> in healthy as-normal state. The only observable problem is the stall of
>> all disk IO while these events occur.
>
> I have no idea how mpt driver works, neither I have hardware to play,
> but quick look shows that 0x12 event is MPI_EVENT_SAS_PHY_LINK_STATUS,
> and 0x16 is MPI_EVENT_SAS_DISCOVERY. Both are not handled by mpt driver
> and so logged. I would say something is going on at physical level of
> your SAN. Timeouts are also could be the result of physical issues.

Ok, I'll try and figure out what's possibly going on.

>
>> As best I can tell, the hardware is ok, both disks report as fine
>> without SMART errors and are only 2 months old, so wanted to rule out
>> software issues. On upgrading to recent 8-STABLE, I got a page fault
>> kernel panic on boot in the mpt driver mpt_raid0 kproc. After some trial
>> and error, r203888 is the most recent revision that boots fine, whilst
>> r203889 exhibits the page fault. I should also note that r203888 still
>> sees the "mpt0: mpt_cam_event: 0x16" messages and associated disk IO
>> stalls.
>>
>> I compiled DDB into my r203889 kernel. Unfortunately my ILO emulates a
>> USB keyboard so I can't do anything in DDB which is a huge pain, but
>> here's the info I did get (hand transcribed):
>>
>> Fatal trap 12: page fault while in kernel mode
>> current process: mpt_raid0
>> Stopped at xpt_rescan+0x1d:     movq   0x10(%rsi),%rdx
>>
>> 1. Any thoughts on how to resolve the regression in the mpt driver with
>> the r203889 commit?
>
> Any thoughts where to find a good telepath? :)
>
> For the beginning, show at least verbose boot messages up to the crash.
> Full panic message could also be useful, it may show address of the
> fault instruction, which may be resolved to source line with addr2line
> tool. If you could find a good old PS/2 keyboard, backtrace would be
> interesting to see.

2 issues:
- The server is in colocated rack space and not easy to get to
- I'm not even sure that this server has PS2 ports on it

Perhaps this commit should be backed out of 8-STABLE until we get a 
chance to diagnose a bit more?

Cheers,
Lawrence


More information about the svn-src-stable-8 mailing list