svn commit: r203889 - in stable/8/sys: cam cam/ata cam/scsi
dev/ahci
dev/asr dev/ata dev/ciss dev/hptiop dev/hptrr dev/mly dev/mpt dev/ppbus
dev/siis dev/trm dev/twa dev/usb/storage
Lawrence Stewart
lstewart at freebsd.org
Thu Feb 18 14:27:41 UTC 2010
Hi Alexander and all,
On 02/15/10 06:38, Alexander Motin wrote:
> Author: mav
> Date: Sun Feb 14 19:38:27 2010
> New Revision: 203889
> URL: http://svn.freebsd.org/changeset/base/203889
>
> Log:
> MFC r203108:
> Large set of CAM improvements:
[snip]
I've been having issues with the mpt-driven LSI SAS adapter in my
SunFire X4100 server running FreeBSD 8-STABLE r202132. Under certain
disk workloads like running an svn update of the src tree or kernel
compile, the disk subsystem will become extremely unresponsive in a
stalled like state, and /var/log/messages will report a number of these:
mpt0: mpt_cam_event: 0x16
It does eventually come good after a minute or two even though the svn
op or build is still running, then it will maybe repeat a few times
stalled/good behaviour sometimes with minutes between events.
A couple of times it has gotten even more upset reporting things like this:
mpt0: mpt_cam_event: 0x16
mpt0: mpt_cam_event: 0x16
mpt0: request 0xffffff80002f1400:54058 timed out for ccb
0xffffff0001c65000 (req->ccb 0xffffff0001c65000)
mpt0: attempting to abort req 0xffffff80002f1400:54058 function 0
mpt0: request 0xffffff80002fd100:54059 timed out for ccb
0xffffff009f3ec800 (req->ccb 0xffffff009f3ec800)
mpt0: request 0xffffff80002efcf0:54060 timed out for ccb
0xffffff0001bd2000 (req->ccb 0xffffff0001bd2000)
mpt0: mpt_recover_commands: IOC Status 0x4a. Resetting controller.
mpt0: mpt_cam_event: 0x0
mpt0: mpt_cam_event: 0x0
mpt0: completing timedout/aborted req 0xffffff80002f1400:54058
mpt0: completing timedout/aborted req 0xffffff80002fd100:54059
mpt0: completing timedout/aborted req 0xffffff80002efcf0:54060
mpt0: mpt_cam_event: 0x16
mpt0: mpt_cam_event: 0x12
mpt0: mpt_cam_event: 0x12
mpt0: mpt_cam_event: 0x16
mpt0: Volume(0:2): Volume Status Changed
mpt0: request 0xffffff80002f8990:0 timed out for ccb 0xffffff009f3cb800
(req->ccb 0)
No ill effects are observed after such an episode and the array remains
in healthy as-normal state. The only observable problem is the stall of
all disk IO while these events occur.
The disk configuration is 2 x 320GB WD3200BEKT 7200RPM SATA HDDs in
RAID1. The hardware reports itself as:
mpt0: <LSILogic SAS/SATA Adapter> port 0xa800-0xa8ff mem
0xfc4fc000-0xfc4fffff,0xfc4e0000-0xfc4effff irq 28 at device 3.0 on pci2
mpt0: [ITHREAD]
mpt0: MPI Version=1.5.13.0
mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 )
mpt0: 1 Active Volume (2 Max)
mpt0: 2 Hidden Drive Members (10 Max)
mpt0 at pci0:2:3:0: class=0x010000 card=0x30601000 chip=0x00501000
rev=0x02 hdr=0x00
vendor = 'LSI Logic (Was: Symbios Logic, NCR)'
device = 'SAS 3000 series, 4-port with 1064 -StorPort'
class = mass storage
subclass = SCSI
As best I can tell, the hardware is ok, both disks report as fine
without SMART errors and are only 2 months old, so wanted to rule out
software issues. On upgrading to recent 8-STABLE, I got a page fault
kernel panic on boot in the mpt driver mpt_raid0 kproc. After some trial
and error, r203888 is the most recent revision that boots fine, whilst
r203889 exhibits the page fault. I should also note that r203888 still
sees the "mpt0: mpt_cam_event: 0x16" messages and associated disk IO stalls.
I compiled DDB into my r203889 kernel. Unfortunately my ILO emulates a
USB keyboard so I can't do anything in DDB which is a huge pain, but
here's the info I did get (hand transcribed):
Fatal trap 12: page fault while in kernel mode
current process: mpt_raid0
Stopped at xpt_rescan+0x1d: movq 0x10(%rsi),%rdx
So there are two separate issues here:
1. Any thoughts on how to resolve the regression in the mpt driver with
the r203889 commit?
2. Any thoughts on the behaviour I'm seeing with the mpt_cam_event
messages? Is it possible it's just a driver issue? Is the hardware
likely bad? I'm really hoping they'll go away once the driver issue is
resolved as the freezes are fairly unacceptable on a production machine
and the hardware appears to pass all checks I've done so far.
Cheers,
Lawrence
More information about the svn-src-stable-8
mailing list