Re: RES: TP-LINK USB no carrier after speed test

From: Hans Petter Selasky <hps_at_selasky.org>
Date: Wed, 28 Sep 2022 09:42:58 UTC
On 9/28/22 11:07, Ivan Quitschal wrote:
> 
> 
> On Tue, 27 Sep 2022, Hans Petter Selasky wrote:
> 
>> On 9/27/22 15:22, Hans Petter Selasky wrote:
>>> On 9/27/22 14:17, Ivan Quitschal wrote:
>>>>
>>>>
>>>> On Tue, 27 Sep 2022, Hans Petter Selasky wrote:
>>>>
>>>>> On 9/27/22 02:24, Alexander Motin wrote:
>>>>>> On 26.09.2022 17:29, Hans Petter Selasky wrote:
>>>>>>> I've got a supposedly "broken" if_ure dongle from Alexander, but 
>>>>>>> I'm unable to reproduce the if_ure hang on two different pieces 
>>>>>>> of XHCI hardware, Intel based and AMD based, which I've got.
>>>>>>>
>>>>>>> This leads me to believe there is a bug in the XHCI driver or 
>>>>>>> hardware on your system.
>>>>>>>
>>>>>>> Can you share the pciconfig -lv output for your XHCI controllers?
>>>>>>
>>>>>> I have two laptops of different generations reproducing this 
>>>>>> problem, but both are having Thunderbolt on the USB-C ports:
>>>>>>
>>>>>> This is one (7th Gen Core i7):
>>>>>>
>>>>>> xhci1@pci0:56:0:0:      class=0x0c0330 rev=0x02 hdr=0x00 
>>>>>> vendor=0x8086 device=0x15d4 subvendor=0x2222 subdevice=0x1111
>>>>>>      vendor     = 'Intel Corporation'
>>>>>>      device     = 'JHL6540 Thunderbolt 3 USB Controller (C step) 
>>>>>> [Alpine Ridge 4C 2016]'
>>>>>>      class      = serial bus
>>>>>>      subclass   = USB
>>>>>>      bar   [10] = type Memory, range 32, base 0xc3f00000, size 
>>>>>> 65536, enabled
>>>>>>      cap 01[80] = powerspec 3  supports D0 D1 D2 D3  current D0
>>>>>>      cap 05[88] = MSI supports 8 messages, 64 bit enabled with 1 
>>>>>> message
>>>>>>      cap 10[c0] = PCI-Express 2 endpoint max data 128(128) RO NS
>>>>>>                   max read 512
>>>>>>                   link x4(x4) speed 2.5(2.5) ASPM disabled(L0s/L1) 
>>>>>> ClockPM disabled
>>>>>>      ecap 0003[100] = Serial 1 20ff910876f10c00
>>>>>>      ecap 0001[200] = AER 1 0 fatal 0 non-fatal 1 corrected
>>>>>>      ecap 0002[300] = VC 1 max VC0
>>>>>>      ecap 0004[400] = Power Budgeting 1
>>>>>>      ecap 000b[500] = Vendor [1] ID 1234 Rev 1 Length 216
>>>>>>      ecap 0018[600] = LTR 1
>>>>>>      ecap 0019[700] = PCIe Sec 1 lane errors 0
>>>>>>
>>>>>> This is another (11th Gen Core i7);
>>>>>>
>>>>>> xhci0@pci0:0:13:0:      class=0x0c0330 rev=0x01 hdr=0x00 
>>>>>> vendor=0x8086 device=0x9a13 subvendor=0x1028 subdevice=0x0991
>>>>>>      vendor     = 'Intel Corporation'
>>>>>>      device     = 'Tiger Lake-LP Thunderbolt 4 USB Controller'
>>>>>>      class      = serial bus
>>>>>>      subclass   = USB
>>>>>>      bar   [10] = type Memory, range 64, base 0x60552c0000, size 
>>>>>> 65536, enabled
>>>>>>      cap 01[70] = powerspec 2  supports D0 D3  current D0
>>>>>>      cap 05[80] = MSI supports 8 messages, 64 bit enabled with 1 
>>>>>> message
>>>>>>      cap 09[90] = vendor (length 20) Intel cap 15 version 0
>>>>>>      cap 09[b0] = vendor (length 0) Intel cap 0 version 1
>>>>>>
>>>>>> Does the system you also has Thunderbolt chip, or you use native 
>>>>>> Intel chipet's XHCI?
>>>>>>
>>>>>>> Also, when running the stress test and you see the traffic stops, 
>>>>>>> what happens if you run this command as root on the ugen which 
>>>>>>> the if_ure belongs to:
>>>>>>>
>>>>>>> usbconfig -d ugenX.Y dump_string 0
>>>>>>>
>>>>>>> Does the traffic resume?
>>>>>>
>>>>>> Nope. Out of 4 times when traffic stopped 2 times it reported 
>>>>>> <read error> and 2 times it completed successfully, but it neither 
>>>>>> case it recovered traffic.  Only reset recovered it.
>>>>>>
>>>>>
>>>>> Hi Alexander,
>>>>>
>>>>> Could you run "usbdump -d X.Y" at the same time to capture all the 
>>>>> errors?
>>>>>
>>>>> Looking especially for USB_ERR_TIMEOUT .
>>>>>
>>>>> I have this:
>>>>>
>>>>> xhci0@pci0:3:0:3:    class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 
>>>>> device=0x15e0 subvendor=0x1849 subdevice=0xffff
>>>>>    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
>>>>>    device     = 'Raven USB 3.1'
>>>>>    class      = serial bus
>>>>>    subclass   = USB
>>>>>
>>>>> xhci0@pci0:0:20:0:    class=0x0c0330 rev=0x21 hdr=0x00 
>>>>> vendor=0x8086 device=0x9d2f subvendor=0x8086 subdevice=0x9d2f
>>>>>    vendor     = 'Intel Corporation'
>>>>>    device     = 'Sunrise Point-LP USB 3.0 xHCI Controller'
>>>>>    class      = serial bus
>>>>>    subclass   = USB
>>>>>
>>>>> --HPS
>>>>>
>>>>
>>>>
>>>> hi Hans
>>>>
>>>> i think i got some good logs for you
>>>>
>>>> before the problem i ran this:
>>>>
>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>
>>>> # usbconfig -d ugen0.10 >> before
>>>> # usbconfig -d ugen0.10 dump_all_desc >> before
>>>> # usbconfig -d ugen0.10 dump_stats >> before_status
>>>>
>>>> the after the problem happened i ran
>>>>
>>>> # usbconfig -d ugen0.10 >> after
>>>> # usbconfig -d ugen0.10 dump_all_desc >> after
>>>> # usbconfig -d ugen0.10 dump_stats >> after_status
>>>>
>>>>
>>>> just by looking i already see some problems comparing both
>>>>
>>>> for example
>>>>
>>>> before the problem we have:
>>>>
>>>> ----------------------
>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>
>>>>    bLength = 0x0012
>>>>    bDescriptorType = 0x0001
>>>>    bcdUSB = 0x0300
>>>>    bDeviceClass = 0x0000  <Probed by interface class>
>>>>    bDeviceSubClass = 0x0000
>>>>    bDeviceProtocol = 0x0000
>>>>    bMaxPacketSize0 = 0x0009
>>>>    idVendor = 0x2357
>>>>    idProduct = 0x0601
>>>>    bcdDevice = 0x3000
>>>> ****
>>>>    iManufacturer = 0x0001  <TP-Link>
>>>>    iProduct = 0x0002  <USB 10/100/1000 LAN>
>>>>    iSerialNumber = 0x0006  <000001>
>>>>    bNumConfigurations = 0x0002
>>>>
>>>> ------------------------
>>>>
>>>> after the problem
>>>>
>>>> --------------------------
>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>
>>>>    bLength = 0x0012
>>>>    bDescriptorType = 0x0001
>>>>    bcdUSB = 0x0300
>>>>    bDeviceClass = 0x0000  <Probed by interface class>
>>>>    bDeviceSubClass = 0x0000
>>>>    bDeviceProtocol = 0x0000
>>>>    bMaxPacketSize0 = 0x0009
>>>>    idVendor = 0x2357
>>>>    idProduct = 0x0601
>>>>    bcdDevice = 0x3000
>>>> ****
>>>> iManufacturer = 0x0001  <retrieving string failed>
>>>>    iProduct = 0x0002  <retrieving string failed>
>>>>    iSerialNumber = 0x0006  <retrieving string failed>
>>>>    bNumConfigurations = 0x0002
>>>>
>>>>   Configuration index 0 --------------------------
>>>>
>>>>
>>>> the log in ttyv0 was this:
>>>>
>>>> ure0: at uhub0, port 14, addr 9 (disconnected)
>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: receive_packet failed 
>>>> on ue0: Device not configured
>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: ioctl(SIOCGIFFLAGS) on 
>>>> ue0: Operation not permitted
>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: Interface ue0 no longer 
>>>> appears valid.
>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: No live interfaces to 
>>>> poll on - exiting.
>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: exiting.
>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: connection closed
>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: exiting.
>>>> rgephy0: detached
>>>> miibus0: detached
>>>> ure0: detached
>>>>
>>>>
>>>> difference between before_status and after_status
>>>>
>>>> before_status:
>>>>
>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>
>>>> {
>>>>      UE_CONTROL_OK       : 2389
>>>>      UE_ISOCHRONOUS_OK   : 0
>>>>      UE_BULK_OK          : 803
>>>>      UE_INTERRUPT_OK     : 0
>>>>      UE_CONTROL_FAIL     : 0
>>>>      UE_ISOCHRONOUS_FAIL : 0
>>>>      UE_BULK_FAIL        : 0
>>>>      UE_INTERRUPT_FAIL   : 0
>>>> }
>>>>
>>>>
>>>> after_status:
>>>>
>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>
>>>> {
>>>>      UE_CONTROL_OK       : 4275
>>>>      UE_ISOCHRONOUS_OK   : 0
>>>>      UE_BULK_OK          : 1126702
>>>>      UE_INTERRUPT_OK     : 0
>>>>      UE_CONTROL_FAIL     : 326
>>>>      UE_ISOCHRONOUS_FAIL : 0
>>>>      UE_BULK_FAIL        : 42
>>>>      UE_INTERRUPT_FAIL   : 0
>>>> }
>>>>
>>>>
>>>>
>>>> i hope that helps
>>>>
>>>>
>>>> all log files are attached
>>>>
>>>> thanks
>>>>
>>>> --tzk
>>>
>>>
>>> One more test request:
>>>
>>> Make sure you have a kernel with "options USB_DEBUG".
>>>
>>> Then after "iperf" says 0bits/s, then you run:
>>>
>>> sysctl hw.usb.xhci.debug=29
>>>
>>> Then run those usbconfig commands.
>>>
>>> Then then:
>>>
>>> sysctl hw.usb.xhci.debug=0
>>>
>>> And collect the logs from /var/log/messages .
>>>
>>> Like said earlier, I wonder if the fault is in the XHCI controller 
>>> itself or something that has to do with thunderbolt, because on my 
>>> machine there is no indication of a if_ure device failure. The only 
>>> way to know for sure is to buy a USB 3.0 wire analyzer, like the 
>>> beagle one, but I'm unsure if they support USB-C .
>>>
>>> Thank you!
>>>
>>> --HPS
>>>
>>
>> FYI: There is some experimental thunderbolt support at:
>>
>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhselasky%2Fusb4&amp;data=05%7C01%7C%7C14c86eee9f5d492c41d508daa0b49bdb%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637998994857157968%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2FOnIO3esoAmi1FSPkHRYpHCHkcN6U2rO9WhaimdaVbk%3D&amp;reserved=0
>>
>> But I'm not sure if it supports the hardware you've got.
>>
>> --HPS
>>
> 
> 
> Hi Hans
> 
> i got two log versions for you, one with the constant set to 2048 (the 
> working version) , and the other with no patches whatsoever (the bad 
> version)
> 
> since the entire logs reached more than 150M of size, i had to cut to 
> the last 1000 lines , hope toats enough
> 
> pleaes find attached the two files
> 
> the xhci_NOT_working i stoped recording right after the problem ocurred
> 
> please let me know if you need something else
> 
> thank you
> 
> --tzk

I need the full log.

The XHCI driver is very verbose you see.

Maybe you can do some filtering, like figuring out all the status codes 
you see:

status=1
status=13

and so on.

--HPS