Re: 60+% ping packet loss on Pi3 under -current and stable-13

From: Chris <bsd-lists_at_bsdforge.com>
Date: Mon, 02 May 2022 15:25:39 UTC
On 2022-05-01 12:58, Mark Millard wrote:
> On 2022-May-1, at 12:15, Mark Millard <marklmi@yahoo.com> wrote:
> 
> 
>> On 2022-May-1, at 11:12, bob prohaska <fbsd@www.zefox.net> wrote:
>> 
>>> On Sat, Apr 30, 2022 at 06:39:57PM -0700, Bakul Shah wrote:
>>>> On Apr 29, 2022, at 7:12 PM, bob prohaska <fbsd@www.zefox.net> wrote:
>>>>> 
>>>>> Since about December of 2021 I've been noticing problems with
>>>>> wired network connectivity on a pair of raspberry pi 3 machines
>>>>> using wired network connections. One runs stable-13.1, the other
>>>>> runs -current, both are up to date as of a few days ago.
>>>>> 
>>>>> Essentially both machines fail to respond to inbound network
>>>>> connections via ssh or ping after reboot. If I get on the
>>>>> serial console and start an outbound ping to anywhere, both
>>>>> machines respond to incoming pings with about a 65% packet
>>>>> loss.
>>> 
>>>> Suggest running tcpdump on the rpi3 to see what is going on
>>>> when connected to the public vs private net.
>>>> 
>>> 
>>> Public net first, since that's where the machine is now. Gateway.zefox.net
>>> is the name of my router's public interface, dcn.org belongs to my isp and
>>> fusionbroadband is their service provider..
>>> 
>>> While on the -current Pi3 serial console (with no outbound ping running)
>>> and no inbound traffic from my hosts I see after a couple minutes:
>>> 
>>> root@www:/mnt # tcpdump
>>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>>> listening on ue0, link-type EN10MB (Ethernet), capture size 262144 bytes
>>> 10:39:40.887853 ARP, Request who-has www.zefox.org tell gateway.zefox.net, 
>>> length 46
>>> 10:39:40.887929 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui 
>>> Unknown), length 28
>>> 10:39:40.893220 ARP, Request who-has 
>>> 50-1-20-1.dsl.static.fusionbroadband.com tell www.zefox.org, length 28
>>> 10:39:40.915469 ARP, Reply 50-1-20-1.dsl.static.fusionbroadband.com is-at 
>>> 00:1b:90:d2:4a:c4 (oui Unknown), length 50
>>> 10:39:40.915529 IP www.zefox.org.50714 > spoke.dcn.davis.ca.us.domain: 
>>> 51409+ PTR? 28.20.1.50.in-addr.arpa. (41)
>>> 10:39:40.943602 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.50714: 
>>> 51409 1/3/6 PTR www.zefox.org. (265)
>>> 10:39:40.945416 IP www.zefox.org.15986 > spoke.dcn.davis.ca.us.domain: 
>>> 44966+ PTR? 31.20.1.50.in-addr.arpa. (41)
>>> 10:39:40.973487 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.15986: 
>>> 44966 1/3/6 PTR gateway.zefox.net. (266)
>>> 10:39:40.975037 IP www.zefox.org.57611 > spoke.dcn.davis.ca.us.domain: 
>>> 31749+ PTR? 1.20.1.50.in-addr.arpa. (40)
>>> 10:39:46.288219 IP www.zefox.org.49710 > wheel.dcn.davis.ca.us.domain: 
>>> 31749+ PTR? 1.20.1.50.in-addr.arpa. (40)
>>> 10:39:46.316239 IP wheel.dcn.davis.ca.us.domain > www.zefox.org.49710: 
>>> 31749 1/3/6 PTR 50-1-20-1.dsl.static.fusionbroadband.com. (291)
>>> 10:39:46.318267 IP www.zefox.org.17061 > spoke.dcn.davis.ca.us.domain: 
>>> 37579+ PTR? 2.253.150.168.in-addr.arpa. (44)
>>> 10:39:46.346851 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.17061: 
>>> 37579* 1/2/2 PTR spoke.dcn.davis.ca.us. (145)
>>> 10:39:46.348674 IP www.zefox.org.40440 > spoke.dcn.davis.ca.us.domain: 
>>> 20572+ PTR? 1.253.150.168.in-addr.arpa. (44)
>>> 10:39:51.420705 IP www.zefox.org.64019 > wheel.dcn.davis.ca.us.domain: 
>>> 20572+ PTR? 1.253.150.168.in-addr.arpa. (44)
>>> 10:39:51.448850 IP wheel.dcn.davis.ca.us.domain > www.zefox.org.64019: 
>>> 20572* 1/2/2 PTR wheel.dcn.davis.ca.us. (145)
>>> 10:40:40.147603 ARP, Request who-has 
>>> 50-1-20-1.dsl.static.fusionbroadband.com tell ns1.zefox.net, length 46
>>> 10:40:40.148844 IP www.zefox.org.46127 > spoke.dcn.davis.ca.us.domain: 
>>> 12186+ PTR? 29.20.1.50.in-addr.arpa. (41)
>>> 10:40:40.176486 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.46127: 
>>> 12186 1/3/6 PTR ns1.zefox.net. (262)
>>> 10:40:57.688225 ARP, Request who-has www.zefox.org tell gateway.zefox.net, 
>>> length 46
>>> 10:40:57.688305 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui 
>>> Unknown), length 28
>>> 10:42:14.488727 ARP, Request who-has www.zefox.org tell gateway.zefox.net, 
>>> length 46
>>> 10:42:14.488804 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui 
>>> Unknown), length 28
>>> 10:42:43.761226 ARP, Request who-has 
>>> 50-1-20-1.dsl.static.fusionbroadband.com tell www.zefox.com, length 46
>>> 10:42:43.762522 IP www.zefox.org.56181 > spoke.dcn.davis.ca.us.domain: 
>>> 28779+ PTR? 26.20.1.50.in-addr.arpa. (41)
>>> 10:42:43.790361 IP spoke.dcn.davis.ca.us.domain > www.zefox.org.56181: 
>>> 28779 1/3/6 PTR www.zefox.com. (265)
>>> 10:43:31.289103 ARP, Request who-has www.zefox.org tell gateway.zefox.net, 
>>> length 46
>>> 10:43:31.289181 ARP, Reply www.zefox.org is-at b8:27:eb:71:46:4e (oui 
>>> Unknown), length 28
>>> 
>>> If I now start an inbound ping from one of my hosts it gets no reply and
>>> tcpdump reports no additional traffic. With an outbound ping running 
>>> there's
>>> at least a sparse reply.
>>> 
>>> ^C
>>> 28 packets captured
>>> 28 packets received by filter
>>> 0 packets dropped by kernel
>>> root@www:/mnt #
>>> 
>>> The "oui unknown" looks like some sort of failure.....
>>> Can you ping www.zefox.org? I have no outside vantage point.
>>> There is still no outbound ping running and I would expect
>>> you'll get no or very sparse reply.
>>> 
>>> 
>>> Thus far only the two Pi3s suffer from connectivity problems; Pi2s and a 
>>> Pi4 have
>>> no difficulty on the same address block. Is there a switch for tcpdump  
>>> that will
>>> limit records to relevant traffic? Otherwise it's a flood.
>>> 
>>> These results were obtained after standing idle overnight and
>>> are rather different (in ways I don't understand) from behavior
>>> immediately after reboot, I'll have to repeat as I learn more.
>> 
>> I wonder if there is a notable difference between
>> monitoring traffic from 2 places:
>> 
>> A) from the machine seeing the problem
>> vs.
>> B) from a machine not having problems but
>>   connected were all the traffic would be
>>   on the wire it is connected to.
>> 
>> It may be that monitoring from both and
>> comparing/contrasting the reported traffic
>> from the two provides additional evidence.
>> 
>> There may be modes of monitoring that are
>> relevant for this. But I'm not familiar
>> with any detail here.
>> 
>> 
>> For reference:
>> 
>> # ping www.zefox.org
>> PING www.zefox.org (50.1.20.28): 56 data bytes
>> ^C
>> --- www.zefox.org ping statistics ---
>> 32 packets transmitted, 0 packets received, 100.0% packet loss
>> 
>> I found the command traceroute and it reports:
>> 
>> # traceroute www.zefox.org
>> traceroute to www.zefox.org (50.1.20.28), 64 hops max, 40 byte packets
>> 1  192.168.1.1 (192.168.1.1)  0.697 ms  0.486 ms  1.277 ms
>> 2  172.30.26.66 (172.30.26.66)  30.019 ms
>>    172.30.26.67 (172.30.26.67)  41.720 ms
>>    172.30.26.66 (172.30.26.66)  28.645 ms
>> 3  68.85.243.125 (68.85.243.125)  8.967 ms
>>    68.85.243.77 (68.85.243.77)  11.462 ms
>>    68.85.243.125 (68.85.243.125)  10.254 ms
>> 4  24.124.129.106 (24.124.129.106)  7.510 ms
>>    96.216.60.165 (96.216.60.165)  10.176 ms
>>    24.124.129.106 (24.124.129.106)  8.945 ms
>> 5  68.85.243.197 (68.85.243.197)  10.837 ms
>>    96.216.60.165 (96.216.60.165)  10.252 ms
>>    68.85.243.197 (68.85.243.197)  16.036 ms
>> 6  68.85.243.197 (68.85.243.197)  14.660 ms
>>    be-36211-cs01.seattle.wa.ibone.comcast.net (68.86.93.49)  14.629 ms
>>    68.85.243.197 (68.85.243.197)  8.849 ms
>> 7  be-2412-pe12.seattle.wa.ibone.comcast.net (96.110.34.142)  14.607 ms
>>    be-36221-cs02.seattle.wa.ibone.comcast.net (68.86.93.53)  14.122 ms
>>    be-2212-pe12.seattle.wa.ibone.comcast.net (96.110.34.134)  13.877 ms
>> 8  be-2412-pe12.seattle.wa.ibone.comcast.net (96.110.34.142)  14.133 ms *  
>> 13.663 ms
>> 9  be2075.ccr21.sfo01.atlas.cogentco.com (154.54.0.233)  30.176 ms *
>>    be3717.ccr22.sfo01.atlas.cogentco.com (154.54.86.209)  29.002 ms
>> 10  be3717.ccr22.sfo01.atlas.cogentco.com (154.54.86.209)  28.477 ms
>>    be2430.ccr31.sjc04.atlas.cogentco.com (154.54.88.186)  27.203 ms
>>    be2075.ccr21.sfo01.atlas.cogentco.com (154.54.0.233)  28.515 ms
>> 11  38.104.141.82 (38.104.141.82)  29.820 ms
>>    be2430.ccr31.sjc04.atlas.cogentco.com (154.54.88.186)  28.605 ms
>>    38.104.141.82 (38.104.141.82)  33.735 ms
>> 12  38.104.141.82 (38.104.141.82)  27.160 ms
>>    0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net (135.180.179.146)  32.336 ms
>>    38.104.141.82 (38.104.141.82)  31.867 ms
>> 13  0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166)  31.761 ms
>>    0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net (135.180.179.146)  29.864 ms
>>    0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166)  31.711 ms
>> 14  0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166)  30.373 ms
>>    gig1-1-1.gw.wscrca11.sonic.net (50.1.36.106)  35.567 ms
>>    0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166)  31.146 ms
>> 15  gig1-1-1.gw.davsca11.sonic.net (50.1.36.110)  31.513 ms
>>    gig1-1-1.gw.wscrca11.sonic.net (50.1.36.106)  31.203 ms
>>    gig1-1-1.gw.davsca11.sonic.net (50.1.36.110)  31.354 ms
>> 16  gig1-1-1.gw.davsca11.sonic.net (50.1.36.110)  30.125 ms *  31.996 ms
>> 17  * * *
>> 18  * * *
>> 19  * * *
>> 20  * * *
>> 21  * * *
>> 22  * * *
>> 23  * * *
>> 24  * * *
>> 25  * * *
>> 26  * * *
>> 27  * * *
>> 28  * * *
>> 29  * * *
>> 30  * * *
>> ^C
>> 
>> (There did not seem to be much point in having it continue.)
> 
> I found and built a port called net/mtr-nox11
> ("My traceroute") and tried it, letting it just
> run. The initial try eventually got a connection
> but reported a 99.2% packet loss as of when I
> captured the below:
> 
>                                      My traceroute  [v0.95]
> amd64_ZFS (192.168.1.120) -> www.zefox.org (50.1.20.28)          
> 2022-05-01T12:40:22-0700
> Keys:  Help   Display mode   Restart statistics   Order of fields   quit
>                                                  Packets               Pings
>  Host                                          Loss%   Snt   Last   Avg  
> Best  Wrst StDev
>  1. 192.168.1.1                                 0.0%   135    0.4   0.8   
> 0.1   3.1   0.4
>  2. 172.30.26.66                                0.0%   134   28.2  26.1   
> 9.3 132.7  18.1
>  3. 68.85.243.77                                0.0%   134    8.6   9.0   
> 7.5  11.2   0.8
>  4. 24.124.129.106                              0.0%   134   10.2   9.1   
> 7.6  13.4   0.9
>  5. 96.216.60.165                               0.0%   134    9.0   9.1   
> 7.8  14.3   0.9
>  6. 68.85.243.197                               0.0%   134   14.4  13.6   
> 9.2  44.3   5.4
>  7. be-36241-cs04.seattle.wa.ibone.comcast.net  0.0%   134   16.8  14.9  
> 13.0  22.6   1.1
>  8. be-2412-pe12.seattle.wa.ibone.comcast.net   0.0%   134   13.5  15.0  
> 12.8  46.4   3.2
>  9. (waiting for reply)
> 10. be2075.ccr21.sfo01.atlas.cogentco.com       0.0%   134   29.3  29.0  
> 26.7  54.1   2.9
> 11. be2379.ccr31.sjc04.atlas.cogentco.com       0.0%   134   28.0  28.7  
> 27.1  40.3   1.3
> 12. 38.104.141.82                               0.0%   134   28.0  33.8  
> 26.6 114.8  16.5
> 13. 0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net      0.0%   134   30.9  31.0  
> 29.0  33.7   0.8
> 14. 0.xe-0-0-0.cr1.scrmca13.sonic.net           0.0%   134   31.1  32.3  
> 29.3  93.2   6.7
> 15. gig1-1-1.gw.wscrca11.sonic.net              0.0%   134   31.3  34.9  
> 29.5 330.4  26.5
> 16. gig1-1-1.gw.davsca11.sonic.net              0.0%   134   32.8  32.1  
> 29.9  44.1   1.7
> 17. (waiting for reply)
> 18. (waiting for reply)
> 19. www.zefox.org                              99.2%   134   74.9  74.9  
> 74.9  74.9   0.0
> 
> I stopped and restarted it and so far no connection
> -- waiting even longer than that first time: Snt
> is now over 600. Rows 18 and 19 have not shown up,
> the last is 17.
> 
> . . . (some more time goes by) . . .
> 
> I have now stopped it, avoiding the extra load on the
> machines and network.
> 
> Looks like there is some problem getting past
> gig1-1-1.gw.davsca11.sonic.net .
Apologies in advance if I'm just making noise. But here's what I see
on a 10Gb network attempting the same traceroute(8)

# traceroute www.zefox.org
traceroute to www.zefox.org (50.1.20.28), 64 hops max, 40 byte packets
  1  static-24-113-41-1.wavecable.com (24.113.41.1)  19.918 ms  16.258 ms  
13.852 ms
  2  174.127.183.72 (174.127.183.72)  18.036 ms  19.647 ms  18.428 ms
  3  be4.cr2-sea-b.bb.as11404.net (174.127.137.16)  16.318 ms  19.963 ms  
22.306 ms
  4  be1.cr2-sea-a.bb.as11404.net (174.127.149.136)  19.391 ms  14.457 ms  
15.808 ms
  5  sea-b2-link.ip.twelve99.net (62.115.49.138)  19.613 ms  22.770 ms  20.330 
ms
  6  sjo-b23-link.ip.twelve99.net (62.115.118.169)  39.478 ms  32.428 ms  
34.416 ms
  7  palo-b24-link.ip.twelve99.net (62.115.115.216)  70.207 ms  41.846 ms  
37.838 ms
  8  sonicnet-ic350733-palo-b24.ip.twelve99-cust.net (62.115.181.227)  44.718 
ms  33.959 ms  42.723 ms
  9  0.xe-0-3-0.scrm-gw1.scrmca01.sonic.net (135.180.179.146)  41.699 ms  
42.660 ms  114.578 ms
10  0.xe-0-0-0.cr1.scrmca13.sonic.net (135.180.179.166)  47.851 ms  51.590 ms 
  41.286 ms
11  gig1-1-1.gw.wscrca11.sonic.net (50.1.36.106)  51.199 ms  39.567 ms  
40.553 ms
12  gig1-1-1.gw.davsca11.sonic.net (50.1.36.110)  45.005 ms  44.096 ms  
41.183 ms
13  * * *
14  * www.zefox.org (50.1.20.28)  62.422 ms *

A trip to sonic net indicates they brag on having better privacy than
their competition. Are they using any privacy extensions that may affect
your ability to ping(8) || traceroute(8) -- TCP/UDP/ICMP? Or is it just
that gig1-1-1.gw.davsca11.sonic.net's BGP is out of date (stale)?

HTH

--Chris
> 
> ===
> Mark Millard
> marklmi at yahoo.com