umass panic (after detaching/attaching card-reader 3 times)

Matthew Dillon dillon at apollo.backplane.com
Wed Mar 17 01:31:20 PST 2004


    I just fixed a bug in DragonFly's UMASS/CAM interface.  DragonFly is
    basically using the 4.x CAM code and the 5.x USB code.  The latest
    CD ISO image does not yet reflect the change or I'd just say try burning
    it and booting and see if you can get the system to screw up (I'll
    generate a new ISO image tomorrow), but perhaps what I found can
    serve as a hint to people working on FreeBSD.

    In anycase, the bug had to do with the way UMASS detaches the CAM SIM.
    What happens is that umass.c/USB_DETACH calls umass_cam_detach_sim(sc)
    and then free's the softc.

    The problem is that CAM may still have a bus scan timeout in progress
    and a bus scan in the device queue (the device queue is destroyed by
    umass_cam_detach_sim), and I believe it is also possible for UMASS to
    have an operation in progress (initiated by CAM) which is racing
    the detach operation.

    When UMASS calls umass_cam_detach_sim() the CAM SIM gets ripped out 
    from under the CAM bus structure but the queued timeout still needs
    to indirect through the SIM so when the timeout happens, BOOM.  The
    bug can also lead to lockups during boot... CAM installs an interrupt
    completion item that the boot code waits for which scans all the CAM
    busses, but if UMASS detaches the sim with ops still queued the bus
    scan never completes and the system boot basically locks up forever
    waiting for it to complete.  I was also able to easily lockup the
    USB chipsets hard while diagnosing these bugs, to the point where I had
    to physically unplug the machine to get it to work again.  I believe 
    this was due to UMASS not properly aborting the pipes (leading to
    a violation of pipe command serialization when talking to the USB
    hardware).

    The fixes I made to DragonFly were to ref-count the CAM SIM so it would
    not be ripped out from under the CAM bus structure, to include CAM's
    pending timeout in the ref-count of the CAM device structure so IT
    wouldn't get ripped out if an active timeout exists, to abort all UMASS
    pipes prior to detaching the sim, and to augment the the CAM XPT code's
    AC_LOST_DEVICE path to: (1) clear out any pending timeouts and 
    (2) flush all CAM software interrupts to make sure the async events 
    have actually completed.

    I don't how much of this applies to FreeBSD-5, since FreeBSD-5
    seems to have rewritten a large chunk of CAM, but it does look like
    some of it might apply.  Probably all of these issues apply to FreeBSD-4.

    The patches are in the DragonFly CVS repository, related to the following
    directories (in DFly): /usr/src/sys/bus/cam, /usr/src/sys/bus/usb,
    and /usr/src/sys/dev/usbmisc, if I remember correctly.  
    www.dragonflybsd.org.  Perhaps Julian, who has been working on the USB
    code in 4.x can take it from there.  Information is the best I can offer.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>

:Jan Pechanec (jp at devnull.cz) wrote: 
:>
:>On Sat, 6 Mar 2004, Holger Kipp wrote:
:>
:>>I experience a very repeatably but unwanted behaviour with umass/usb:
:>>
:>>System hangs/panics after detaching and attaching 8-in-1 Card Reader
:>>several times. Card Reader is attached to Cypress Semiconductor Slim
:>>Hub (ie not directly), but using the built-in hub give the same
:>>results.
:
:>we have similar experince with some of our new boxes based on
:>Via chipset. It seems to me that the attached device is innocent (same
:>errors as yours - uhub port errors, umass detached, umass BBB reset
:>failed etc.) and that the problem is somewhere on Via's side. We are
:>still analysing the problem - but did you get any further since then?
:
:Unfortunately not - I was still 'waiting' for someone who deals with
:usb/umass to shed some light on the issue or asking the right questions.
:
:(What is worse is that I don't have the time to look into this right now.)
:
:The interesting thing is that umass detach seems to happen during
:the BBB-whatever-cycle such that the systems seems to end up using invalid
:nullpointers. This imho should never happen.
:
:>Mar 6 20:43:41 katrin /kernel: umass0: BBB bulk-in clear stall failed, STALLED
:>Mar 6 20:43:41 katrin /kernel: umass0: at uhub2 port 4 (addr 3) disconnected
:>Mar 6 20:43:41 katrin /kernel: umass0: detached
:>Mar 6 20:43:41 katrin /kernel: (null): BBB bulk-out clear stall failed, CANCELLED
:>Mar 6 20:43:41 katrin /kernel: umass-sim:0:0:0:func_code 0x0901: Invalid target
:(target needed)
:>Mar 6 20:43:41 katrin last message repeated 2 times
:>Mar 6 20:43:41 katrin /kernel: panic: (null): Unknown state 0
:
:Unfortunately I don't have enough resources to test this with CURRENT. I am also
:waiting for MFC of umass which might fix a few things.
:
:Regards,
:Holger Kipp


More information about the freebsd-stable mailing list