RE: xhci data toggle out of sync

From: Mahesh Vardhamanaiah <maheshmv_at_juniper.net>
Date: Tue, 19 Apr 2022 15:31:59 UTC
Hi HPS,

Sorry, my knowledge on USB is very limited.  Is there any other way to reset the data toggles if the state 
Transition is not allowed ?

Thanks,
Mahesh

Juniper Business Use Only

-----Original Message-----
From: owner-freebsd-usb@freebsd.org <owner-freebsd-usb@freebsd.org> On Behalf Of Hans Petter Selasky
Sent: Tuesday, April 19, 2022 7:03 PM
To: Mahesh Vardhamanaiah <maheshmv@juniper.net>; Kamal Prasad <krprasad@juniper.net>; freebsd-usb@freebsd.org
Cc: Steve Kiernan <stevek@juniper.net>; Justin Hibbits <jhibbits@juniper.net>; Kumara N Babu <bkumara@juniper.net>; Kristof Provost <kp@FreeBSD.org>; Bjoern A. Zeeb <bz@FreeBSD.org>
Subject: Re: xhci data toggle out of sync

[External Email. Be cautious of content]


Hi Mahesh,

The function xhci_cmd_reset_ep() is supposed to set the TX or RX data toggle back to zero for the endpoint context given by epno.

Maybe you could investigate why that function is not working with your XHCI hardware? I feel you are quite competent in USB regards.

Maybe the Linux USB doesn't do the clear endpoint halt, unless the endpoint really was stuck, so the same problem may actually be there, that if you clear-stall on a running endpoint context, same issue will happen ?!

The XHCI specification is here:
https://urldefense.com/v3/__https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/extensible-host-controler-interface-usb-xhci.pdf__;!!NEt6yMaO-gk!VTYlMbK8RfCg3mezJkf3qLLDgoua_XMbDPFVkcy2DSQ_pEsl9rmfxKNn_cEFkVKM$

I looked a bit at drivers/usb/host/xhci*.c, and it looks very much to me like they don't support out-of-band clear endpoint halt!

>                 switch (GET_EP_CTX_STATE(ep_ctx)) {
>                 case EP_STATE_HALTED:
>                         xhci_dbg(xhci, "Stop ep completion raced with stall, reset ep\n");
>                         if (ep->ep_state & EP_HAS_STREAMS) {
>                                 reset_type = EP_SOFT_RESET;
>                         } else {
>                                 reset_type = EP_HARD_RESET;
>                                 td = find_halted_td(ep);
>                                 if (td)
>                                         td->status = -EPROTO;
>                         }
>                         /* reset ep, reset handler cleans up cancelled tds */
>                         err = xhci_handle_halted_endpoint(xhci, ep, 0, td,
>                                                           reset_type);
>                         if (err)
>                                 break;
>                         xhci_stop_watchdog_timer_in_irq(xhci, ep);
>                         return;
>                 case EP_STATE_RUNNING:
>                         /* Race, HW handled stop ep cmd before ep was running */
>                         xhci_dbg(xhci, "Stop ep completion ctx error, 
> ep is running\n");
>
>                         command = xhci_alloc_command(xhci, false, GFP_ATOMIC);
>                         if (!command)
>                                 xhci_stop_watchdog_timer_in_irq(xhci, 
> ep);
>
>                         mod_timer(&ep->stop_cmd_timer,
>                                   jiffies + XHCI_STOP_EP_CMD_TIMEOUT * HZ);
>                         xhci_queue_stop_endpoint(xhci, command, slot_id, ep_index, 0);
>                         xhci_ring_cmd_db(xhci);
>
>                         return;
>                 default:
>                         break;
>                 }


If I'm not mistaken, the hardware design was forced to follow the "Figure 4-5: Endpoint State Diagram" in the PDF file I've mentioned.
Which basically means RESET EP cannot be executed from RUNNING state like we need to !?

Sounds like someone at usb.org/Intel should get involved!

What do you think? And how should we solve this smoothly?

--HPS