Strange CAM errors
Willem Jan Withagen
wjw at digiware.nl
Mon Dec 17 21:45:07 UTC 2012
On 17-12-2012 20:16, Jim Harris wrote:
>
>
> On Mon, Dec 17, 2012 at 9:26 AM, Willem Jan Withagen <wjw at digiware.nl
> <mailto:wjw at digiware.nl>> wrote:
>
> On 2012-12-17 15:38, Steven Hartland wrote:
> > Check the smart results of each disk in the array you may have a
> failing
> > disk.
> > ----- Original Message ----- From: "Willem Jan Withagen"
> <wjw at digiware.nl <mailto:wjw at digiware.nl>>
> > To: "FreeBSD Stable Users" <freebsd-stable at freebsd.org
> <mailto:freebsd-stable at freebsd.org>>
> > Sent: Monday, December 17, 2012 10:58 AM
> > Subject: Strange CAM errors
> >
> >
> >> Hi,
> >>
> >> I have not noticed this before, but my system rebooted this
> morning and
> >> in the following security report I found a lot of messgaes in the
> >> dmesg-part like:
> >>
> >> +(probe0:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0
> >> +(probe0:arcmsr0:0:16:1): CAM status: Command timeout
> >> +(probe0:arcmsr0:0:16:1): Retrying command
> >> +(probe0:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0
> >> +(probe0:arcmsr0:0:16:1): CAM status: Command timeout
> >> +(probe0:arcmsr0:0:16:1): Retrying command
> >>
> >> And it seems that bus 16 is:
> >> +pass6 at arcmsr0 bus 0 scbus0 target 16 lun 0
> >> +pass6: <Areca RAID controller R001> Fixed Processor SCSI-0 device
> >>
> >> The system has been running
> >> FreeBSD zfs.digiware.nl <http://zfs.digiware.nl> 9.1-PRERELEASE
> FreeBSD 9.1-PRERELEASE #3: Wed
> >> Nov 14 13:25:55 CET 2012
> >> root at zfs.digiware.nl:/usr/obj/usr/srcs/src9/src/sys/ZFS amd64
> >> for already a while.
> >>
> >> Anybody suggestions as to why I have these messages?
> >>
> >> They are during the boot sequence, so no smartd talking to the
> disks at
> >> that moment.
> >>
> >> --WjW
> >>
> >> ps: dmesg, config, etc.... at:
>
> >> http://www.tegenbosch28.nl/FreeBSD/Systems/ZFS
> >> ps2: upgrading to the most recent 9.1
>
> 'mmm,
>
> Smartd seems to think otherwise...
>
> 'camcontrol rescan all' actually delivers the same pack of errors.
>
> --WjW
>
>
> The timeouts are occurring on inquiry commands to non-zero LUNs.
> arcmsr(4) is returning CAM_SEL_TIMEOUT instead of CAM_DEV_NOT_THERE for
> inquiry commands to this device and LUN > 0. CAM_DEV_NOT_THERE is
> preferred to remove these types of warnings, and similar patches have
> gone into for other SCSI drivers recently.
>
> Can you try this patch?
>
> Index: sys/dev/arcmsr/arcmsr.c
> ===================================================================
> --- sys/dev/arcmsr/arcmsr.c (revision 244190)
> +++ sys/dev/arcmsr/arcmsr.c (working copy)
> @@ -2439,7 +2439,7 @@
> char *buffer=pccb->csio.data_ptr;
>
> if (pccb->ccb_h.target_lun) {
> - pccb->ccb_h.status |= CAM_SEL_TIMEOUT;
> + pccb->ccb_h.status |= CAM_DEV_NOT_THERE;
> xpt_done(pccb);
> return;
> }
>
Hi Jim,
The noise has gone down by a factor of 5, now I get:
(probe6:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0
(probe6:arcmsr0:0:16:1): CAM status: Unable to terminate I/O CCB request
(probe6:arcmsr0:0:16:1): Error 5, Unretryable error
(probe6:arcmsr0:0:16:2): INQUIRY. CDB: 12 40 0 0 24 0
Which is defined in sys/cam/cam.c ....
as CAM_UA_TERMIO, but that error is nowhere set in the arcmsr code....
So I clearly do not yet know enough to hellp in this.
--WjW
For all of the ports on the adapter.
More information about the freebsd-stable
mailing list