CAM status: CCB request completed with an error

Sat Apr 5 07:40:27 UTC 2014

In freebsd-questions Digest, Vol 513, Issue 6, Message: 13
On Thu, 03 Apr 2014 19:52:34 -0700 "Ronald F. Guilmette" <rfg at tristatelogic.com> wrote:
 > In message <20140403231108.F92606 at sola.nimnet.asn.au>, 
 > Ian Smith <smithi at nimnet.asn.au> wrote:
 > 
 > >In freebsd-questions Digest, Vol 513, Issue 5, Message: 29
 > >On Thu, 3 Apr 2014 03:40:09 -0400 kpneal at pobox.com wrote:
 > > > On Wed, Apr 02, 2014 at 03:48:21AM -0700, Ronald F. Guilmette wrote:
 > >| Apr  2 03:40:31 segfault kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 0
 > > 0 1e 6f ff 0 0 1 0
 > >| Apr  2 03:40:31 segfault kernel: (da0:umass-sim0:0:0:0): CAM status: CCB req
 > >uest completed with an error
 > >| Apr  2 03:40:31 segfault kernel: (da0:umass-sim0:0:0:0): Retrying command
 > >| Apr  2 03:40:37 segfault kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 0
 > > 0 1e 6f ff 0 0 1 0
 > >| Apr  2 03:40:37 segfault kernel: (da0:umass-sim0:0:0:0): CAM status: CCB req
 > >uest completed with an error
 > >| Apr  2 03:40:37 segfault kernel: (da0:umass-sim0:0:0:0): Retrying command
 > >
 > >repeated every few seconds clearly enough indicate disk READ(10) errors, 
 > 
 > Yes.  That part, at least, is crystal clear.
 > 
 > >all apparently at the same place,
 > 
 > Come again please??  How did you reach THAT conclusion?

OK, you're correct, I didn't look closely enough.  The ones you 
originally quoted were at _almost_ the same place, and they came in 
groups of five error messages, followed in each case by eg:

Apr  2 03:40:53 segfault kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted

Feeling a bit more inclined than usual to do someone else's homework :) 
I followed the refs given to Seagate's "SCSI Commands Reference Manual" 
http://www.seagate.com/staticfiles/support/disc/manuals/Interface%20manuals/100293068c.pdf 
and from the READ(10) page, determined that the errors occurred trying 
to read 1 block (presumably 512 bytes) from block 0x1e6fff (from above, 
big-endian) from disk da0.  Later ones you quoted were instead for 
blocks 0x1e6ffe, ..e1 and ..f5, ie for LBAs from 1994741 to 1994751.

Which are up around 973.99 MiB, rather suggesting a 1GB memstick, no?

 > You are obviously seeing something within these syslog messages that your
 > average garden variety rube (like me) is not seeing... or likely to see.

I didn't know what the numbers meant till I looked them up, as one does.

 > >and should most likely be considered 
 > >serious enough to warrant knowing about .. maybe doing something about?
 > 
 > For the record, here are a few relevant facts:
 > 
 > 1)  As I later determined, the messages in question are all relating to
 >     either or both of two (2) USB memory sticks that I have installed...
 >     essentially on a permanent basis... into the server in question.

'da0:umass-sim0:0:0:0' seemed a fair clue ..

 > 2)  There is not, was not, and has not been _any_ hardware problem of any
 >     kind with _either_ of the two USB stick in question.  Upon shutdown
 >     and reboot, both are working just fine, with no errors whatsoever.
 >     (These things have no moving parts.  With what I am sure are extra-
 >     ordinarily rare exceptions, these things don't just simply "go bad",
 >     notwithstanding rare anecdotes to the contrary.)

Perhaps not, but quite a few "come bad", and require specific 'quirks' 
to get around (often timing) issues with particular brands.  There's 
also very (perhaps insanely) detailed debugging knobs for USB that you 
could get advice on using, if you find the manuals too heavy going.

Working fine on shutdown and reboot does not necessarily indicate any 
disk is trouble-free.  Nor does fsck, if the metadata is ok.  Only a 
full surface read scan will verify a disk - and then only for reading!

 > 3)  I have, over time, experienced multiple serious problems with, um,
 >     various, shall we say, "non-features" of the FreeBSD USB driver(s).
 >     The endless cascade of syslog messages I reported all appear to have
 >     been caused by yet another one of these, albeit a new one... *not*
 >     one of the incredibly troublesome ones that I was already and previously
 >     familiar with.

I'm sorry your hardware has a problem with USB.  With the exception of 
not being able to get a $30 yun-cha Cardbus USB 2.0 adaptor to work on 
my old Thinkpad T23s, and before a bug was fixed sometime before 9.0 in 
the UHCI driver that caused a 60-second stall on resuming said T23s, 
I've had no other issues myself, either with sticks or external HDs.

Can you succesfully 'dd if=/dev/da0 of=/dev/null' each of those sticks 
without such errors showing up at around the 973MB area?  (assuming 512 
byte sectors .. with 4k sectors those LBAs could indicate an 8GB stick)

 > A retorical question:
 > 
 > Am I really the only person for whom the FreeBSD USB driver(s) seem to
 > keel over and die the instant one has the unmitigated audacity to even
 > look at them sideways?

Perhaps not.  If I were having any such issues, I'd be chasing them down 
on the appropriate lists, providing sufficient information to debug it.

 > I do get the impression that probably >95% of all FreeBSD installs,
 > everywhere, are on systems contained in Big Racks within Big Data Centers
 > where none of said systems ever come within a country mile of any USB
 > sticks... or even within a country mile of any USB mass storage of any
 > kind... and that thus, it is only the occasional fool like me... who
 > tries (with dubious sanity) to use FreeBSD as my desktop OS... who ever
 > sees just how un-evolved the USB drivers are, you know, for anything
 > other than (relatively less taxing) mice, keyboards, and printers.

I tend to doubt your estimate but have no more data to base a guestimate 
upon than you do.  Certainly plenty of people are happily using FreeBSD 
on the desktop, and freebsd-mobile@ at least has quite a few denizens.

 > To say that the FreeBSD USB driver(s) appear to lack the ability to fail
 > gracefully would, I think, and given the evidence I posted, appear to be
 > an understatement.  To say that they also _perceive_ hardware failure
 > when there is in fact none present, is, for me at least, a truth beyond
 > question.

Fail gracefully?  Have you tried to find out what process is trying to 
read these sectors unsuccessfully, then tring to read nearby sectors, 
also unsuccessfully?  Clearly something is trying, and failing to do so, 
and perhaps whatever that is will be logging its failures somewhere?  
You may be able to trace them by the times they begin .. daily periodic, 
or some other cron job perhaps?

These are low-level errors, that drivers return to higher-level tasks.  
Find out what the higher-level process is and you might be able to solve 
this mystery.  Blaming a whole generic subsystem isn't likely to achieve 
anything towards accomplishing that.

OS version?  dmesg entries for the disk in question?  Stuff like that.

Cheers, Ian

PS I take (C. daily) digests; cc'ing me may get a faster response.