Re: xhci USB transaction error and subsequent recovery mechanism on Freebsd stable/12

From: Chris <bsd-lists_at_bsdforge.com>
Date: Tue, 12 Apr 2022 21:26:36 UTC
On 2022-04-12 08:10, mahesh mv wrote:
> Hi all,
> 
>  
> 
> Need you help regarding an urgent issue where we are observing an issue with
> Freebsd stable/12. The DATA0/DATA1 are out of sync with respect to EP and 
> the
> system experiences the
> 
> READ(10) errors. The READ(10) error recovers with in couple of retries most 
> of the
> times but few cases we have observed that the read retries gets exhausted 
> and
>  system moves
> 
> to unusable state (continuous g_vfs_done() errors) . We are using Junos but 
> the
> xhci driver etc.. are all pristine stable 12 drivers no Juniper specific 
> changes.
>  This issue was never observed with Linux kernel 5.4.2 on the same HW.
>  Errors Seen on console
> 
>  
> 
> (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 28 cf 28 00 00 40 00
> 
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> 
> (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
> 
> (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 28 cf 28 00 00 40 00
> 
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> 
> (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
> 
> FreeBSD/arm (Amnesiac) (ttyu0)
> 
> login:
> 
> I can share the USB traces taken at the USB device if required.
> Thanks,Mahesh
I just replaced a drive 2 days ago that exhibited the same behavior. I 
haven't (yet)
checked the replaced drive yet for cause. But what I chose to do was as 
follows.
Get a new (known dependable) drive. Add it to the system and dump the data on 
the
failing disk to the new drive. At least you'll have a safe copy of it.
You didn't say how the drive(s) are formatted/laid out. Are you using UFS/GPT 
or
ZFS?
How you proceed after making a safe copy will depend on how you manage your 
disks.
UFS/GPT?: simply remove the failing the disk, and change the entry in 
fdtab(5) to
point to the new disk.
ZFS. It should be enough to simply replace the failing disk with one at least 
the
size of the failing one and resilver.

HTH

--Chris