Re: RES: TP-LINK USB no carrier after speed test

From: Ivan Quitschal <tezeka_at_hotmail.com>
Date: Wed, 28 Sep 2022 11:54:12 UTC

On Wed, 28 Sep 2022, Hans Petter Selasky wrote:

> On 9/28/22 11:07, Ivan Quitschal wrote:
>> 
>> 
>> On Tue, 27 Sep 2022, Hans Petter Selasky wrote:
>> 
>>> On 9/27/22 15:22, Hans Petter Selasky wrote:
>>>> On 9/27/22 14:17, Ivan Quitschal wrote:
>>>>> 
>>>>> 
>>>>> On Tue, 27 Sep 2022, Hans Petter Selasky wrote:
>>>>> 
>>>>>> On 9/27/22 02:24, Alexander Motin wrote:
>>>>>>> On 26.09.2022 17:29, Hans Petter Selasky wrote:
>>>>>>>> I've got a supposedly "broken" if_ure dongle from Alexander, but I'm 
>>>>>>>> unable to reproduce the if_ure hang on two different pieces of XHCI 
>>>>>>>> hardware, Intel based and AMD based, which I've got.
>>>>>>>> 
>>>>>>>> This leads me to believe there is a bug in the XHCI driver or 
>>>>>>>> hardware on your system.
>>>>>>>> 
>>>>>>>> Can you share the pciconfig -lv output for your XHCI controllers?
>>>>>>> 
>>>>>>> I have two laptops of different generations reproducing this problem, 
>>>>>>> but both are having Thunderbolt on the USB-C ports:
>>>>>>> 
>>>>>>> This is one (7th Gen Core i7):
>>>>>>> 
>>>>>>> xhci1@pci0:56:0:0:      class=0x0c0330 rev=0x02 hdr=0x00 vendor=0x8086 
>>>>>>> device=0x15d4 subvendor=0x2222 subdevice=0x1111
>>>>>>>      vendor     = 'Intel Corporation'
>>>>>>>      device     = 'JHL6540 Thunderbolt 3 USB Controller (C step) 
>>>>>>> [Alpine Ridge 4C 2016]'
>>>>>>>      class      = serial bus
>>>>>>>      subclass   = USB
>>>>>>>      bar   [10] = type Memory, range 32, base 0xc3f00000, size 65536, 
>>>>>>> enabled
>>>>>>>      cap 01[80] = powerspec 3  supports D0 D1 D2 D3  current D0
>>>>>>>      cap 05[88] = MSI supports 8 messages, 64 bit enabled with 1 
>>>>>>> message
>>>>>>>      cap 10[c0] = PCI-Express 2 endpoint max data 128(128) RO NS
>>>>>>>                   max read 512
>>>>>>>                   link x4(x4) speed 2.5(2.5) ASPM disabled(L0s/L1) 
>>>>>>> ClockPM disabled
>>>>>>>      ecap 0003[100] = Serial 1 20ff910876f10c00
>>>>>>>      ecap 0001[200] = AER 1 0 fatal 0 non-fatal 1 corrected
>>>>>>>      ecap 0002[300] = VC 1 max VC0
>>>>>>>      ecap 0004[400] = Power Budgeting 1
>>>>>>>      ecap 000b[500] = Vendor [1] ID 1234 Rev 1 Length 216
>>>>>>>      ecap 0018[600] = LTR 1
>>>>>>>      ecap 0019[700] = PCIe Sec 1 lane errors 0
>>>>>>> 
>>>>>>> This is another (11th Gen Core i7);
>>>>>>> 
>>>>>>> xhci0@pci0:0:13:0:      class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086 
>>>>>>> device=0x9a13 subvendor=0x1028 subdevice=0x0991
>>>>>>>      vendor     = 'Intel Corporation'
>>>>>>>      device     = 'Tiger Lake-LP Thunderbolt 4 USB Controller'
>>>>>>>      class      = serial bus
>>>>>>>      subclass   = USB
>>>>>>>      bar   [10] = type Memory, range 64, base 0x60552c0000, size 
>>>>>>> 65536, enabled
>>>>>>>      cap 01[70] = powerspec 2  supports D0 D3  current D0
>>>>>>>      cap 05[80] = MSI supports 8 messages, 64 bit enabled with 1 
>>>>>>> message
>>>>>>>      cap 09[90] = vendor (length 20) Intel cap 15 version 0
>>>>>>>      cap 09[b0] = vendor (length 0) Intel cap 0 version 1
>>>>>>> 
>>>>>>> Does the system you also has Thunderbolt chip, or you use native Intel 
>>>>>>> chipet's XHCI?
>>>>>>> 
>>>>>>>> Also, when running the stress test and you see the traffic stops, 
>>>>>>>> what happens if you run this command as root on the ugen which the 
>>>>>>>> if_ure belongs to:
>>>>>>>> 
>>>>>>>> usbconfig -d ugenX.Y dump_string 0
>>>>>>>> 
>>>>>>>> Does the traffic resume?
>>>>>>> 
>>>>>>> Nope. Out of 4 times when traffic stopped 2 times it reported <read 
>>>>>>> error> and 2 times it completed successfully, but it neither case it 
>>>>>>> recovered traffic.  Only reset recovered it.
>>>>>>> 
>>>>>> 
>>>>>> Hi Alexander,
>>>>>> 
>>>>>> Could you run "usbdump -d X.Y" at the same time to capture all the 
>>>>>> errors?
>>>>>> 
>>>>>> Looking especially for USB_ERR_TIMEOUT .
>>>>>> 
>>>>>> I have this:
>>>>>> 
>>>>>> xhci0@pci0:3:0:3:    class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 
>>>>>> device=0x15e0 subvendor=0x1849 subdevice=0xffff
>>>>>>    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
>>>>>>    device     = 'Raven USB 3.1'
>>>>>>    class      = serial bus
>>>>>>    subclass   = USB
>>>>>> 
>>>>>> xhci0@pci0:0:20:0:    class=0x0c0330 rev=0x21 hdr=0x00 vendor=0x8086 
>>>>>> device=0x9d2f subvendor=0x8086 subdevice=0x9d2f
>>>>>>    vendor     = 'Intel Corporation'
>>>>>>    device     = 'Sunrise Point-LP USB 3.0 xHCI Controller'
>>>>>>    class      = serial bus
>>>>>>    subclass   = USB
>>>>>> 
>>>>>> --HPS
>>>>>> 
>>>>> 
>>>>> 
>>>>> hi Hans
>>>>> 
>>>>> i think i got some good logs for you
>>>>> 
>>>>> before the problem i ran this:
>>>>> 
>>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>> 
>>>>> # usbconfig -d ugen0.10 >> before
>>>>> # usbconfig -d ugen0.10 dump_all_desc >> before
>>>>> # usbconfig -d ugen0.10 dump_stats >> before_status
>>>>> 
>>>>> the after the problem happened i ran
>>>>> 
>>>>> # usbconfig -d ugen0.10 >> after
>>>>> # usbconfig -d ugen0.10 dump_all_desc >> after
>>>>> # usbconfig -d ugen0.10 dump_stats >> after_status
>>>>> 
>>>>> 
>>>>> just by looking i already see some problems comparing both
>>>>> 
>>>>> for example
>>>>> 
>>>>> before the problem we have:
>>>>> 
>>>>> ----------------------
>>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>> 
>>>>>    bLength = 0x0012
>>>>>    bDescriptorType = 0x0001
>>>>>    bcdUSB = 0x0300
>>>>>    bDeviceClass = 0x0000  <Probed by interface class>
>>>>>    bDeviceSubClass = 0x0000
>>>>>    bDeviceProtocol = 0x0000
>>>>>    bMaxPacketSize0 = 0x0009
>>>>>    idVendor = 0x2357
>>>>>    idProduct = 0x0601
>>>>>    bcdDevice = 0x3000
>>>>> ****
>>>>>    iManufacturer = 0x0001  <TP-Link>
>>>>>    iProduct = 0x0002  <USB 10/100/1000 LAN>
>>>>>    iSerialNumber = 0x0006  <000001>
>>>>>    bNumConfigurations = 0x0002
>>>>> 
>>>>> ------------------------
>>>>> 
>>>>> after the problem
>>>>> 
>>>>> --------------------------
>>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>> 
>>>>>    bLength = 0x0012
>>>>>    bDescriptorType = 0x0001
>>>>>    bcdUSB = 0x0300
>>>>>    bDeviceClass = 0x0000  <Probed by interface class>
>>>>>    bDeviceSubClass = 0x0000
>>>>>    bDeviceProtocol = 0x0000
>>>>>    bMaxPacketSize0 = 0x0009
>>>>>    idVendor = 0x2357
>>>>>    idProduct = 0x0601
>>>>>    bcdDevice = 0x3000
>>>>> ****
>>>>> iManufacturer = 0x0001  <retrieving string failed>
>>>>>    iProduct = 0x0002  <retrieving string failed>
>>>>>    iSerialNumber = 0x0006  <retrieving string failed>
>>>>>    bNumConfigurations = 0x0002
>>>>> 
>>>>>   Configuration index 0 --------------------------
>>>>> 
>>>>> 
>>>>> the log in ttyv0 was this:
>>>>> 
>>>>> ure0: at uhub0, port 14, addr 9 (disconnected)
>>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: receive_packet failed on 
>>>>> ue0: Device not configured
>>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: ioctl(SIOCGIFFLAGS) on ue0: 
>>>>> Operation not permitted
>>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: Interface ue0 no longer 
>>>>> appears valid.
>>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: No live interfaces to poll 
>>>>> on - exiting.
>>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: exiting.
>>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: connection closed
>>>>> Sep 27 08:55:58 tzk-inspiron dhclient[1201]: exiting.
>>>>> rgephy0: detached
>>>>> miibus0: detached
>>>>> ure0: detached
>>>>> 
>>>>> 
>>>>> difference between before_status and after_status
>>>>> 
>>>>> before_status:
>>>>> 
>>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>> 
>>>>> {
>>>>>      UE_CONTROL_OK       : 2389
>>>>>      UE_ISOCHRONOUS_OK   : 0
>>>>>      UE_BULK_OK          : 803
>>>>>      UE_INTERRUPT_OK     : 0
>>>>>      UE_CONTROL_FAIL     : 0
>>>>>      UE_ISOCHRONOUS_FAIL : 0
>>>>>      UE_BULK_FAIL        : 0
>>>>>      UE_INTERRUPT_FAIL   : 0
>>>>> }
>>>>> 
>>>>> 
>>>>> after_status:
>>>>> 
>>>>> ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST 
>>>>> spd=SUPER (5.0Gbps) pwr=ON (72mA)
>>>>> 
>>>>> {
>>>>>      UE_CONTROL_OK       : 4275
>>>>>      UE_ISOCHRONOUS_OK   : 0
>>>>>      UE_BULK_OK          : 1126702
>>>>>      UE_INTERRUPT_OK     : 0
>>>>>      UE_CONTROL_FAIL     : 326
>>>>>      UE_ISOCHRONOUS_FAIL : 0
>>>>>      UE_BULK_FAIL        : 42
>>>>>      UE_INTERRUPT_FAIL   : 0
>>>>> }
>>>>> 
>>>>> 
>>>>> 
>>>>> i hope that helps
>>>>> 
>>>>> 
>>>>> all log files are attached
>>>>> 
>>>>> thanks
>>>>> 
>>>>> --tzk
>>>> 
>>>> 
>>>> One more test request:
>>>> 
>>>> Make sure you have a kernel with "options USB_DEBUG".
>>>> 
>>>> Then after "iperf" says 0bits/s, then you run:
>>>> 
>>>> sysctl hw.usb.xhci.debug=29
>>>> 
>>>> Then run those usbconfig commands.
>>>> 
>>>> Then then:
>>>> 
>>>> sysctl hw.usb.xhci.debug=0
>>>> 
>>>> And collect the logs from /var/log/messages .
>>>> 
>>>> Like said earlier, I wonder if the fault is in the XHCI controller itself 
>>>> or something that has to do with thunderbolt, because on my machine there 
>>>> is no indication of a if_ure device failure. The only way to know for 
>>>> sure is to buy a USB 3.0 wire analyzer, like the beagle one, but I'm 
>>>> unsure if they support USB-C .
>>>> 
>>>> Thank you!
>>>> 
>>>> --HPS
>>>> 
>>> 
>>> FYI: There is some experimental thunderbolt support at:
>>> 
>>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhselasky%2Fusb4&amp;data=05%7C01%7C%7Cc2f534f631fd47afec9908daa135d60b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637999549868812246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=uL5DwbPcldediZmBiufQXnkF7%2F2WQTizqVLAYBHZjqA%3D&amp;reserved=0
>>> 
>>> But I'm not sure if it supports the hardware you've got.
>>> 
>>> --HPS
>>> 
>> 
>> 
>> Hi Hans
>> 
>> i got two log versions for you, one with the constant set to 2048 (the 
>> working version) , and the other with no patches whatsoever (the bad 
>> version)
>> 
>> since the entire logs reached more than 150M of size, i had to cut to the 
>> last 1000 lines , hope toats enough
>> 
>> pleaes find attached the two files
>> 
>> the xhci_NOT_working i stoped recording right after the problem ocurred
>> 
>> please let me know if you need something else
>> 
>> thank you
>> 
>> --tzk
>
> I need the full log.
>
> The XHCI driver is very verbose you see.
>
> Maybe you can do some filtering, like figuring out all the status codes you 
> see:
>
> status=1
> status=13
>
> and so on.
>
> --HPS
>


hi Hans

i will try to do some logs filtered like you said , and will also try to 
compress as much as i want the full ones. ( i may be able to transform 150M into 
no more than 10M, its all plain text)

i will send you here and the list as soon as i do this .. dont know if ill be 
able to do this today, but tomorrow you'll get the full logs even if i have to 
open the ftp for you myself :- )


thanks

--tzk