Re: a new one - if "re0" stopped responding (fwd)

From: doug <doug_at_safeport.com>
Date: Tue, 25 Oct 2022 17:19:52 UTC
TMI - the rest at the bottom

On Sat, 22 Oct 2022, spellberg_robert wrote:

> 22_oct_22_sat 20.55.utc
> howdy , folks ---
> 
> regarding the problem , which i describe here_in .
>  i am convinced that it results from some_thing , which i did ;
>  but , what_ever the cause ,
>  it was done w/ neither knowledge nor intent , before_hand .
> i wonder if the cause is some_thing non_intuitive .
> 
> 
> 
> while preparing my 22_oct_20_thu response to david , on "-questions" ,
>  regarding my isp's dhcp addresses
>  [ there is much info in that post , which i will not repeat , here ] ,
>  i booted [ on 22_oct_19_wed ] natalie_11.3 , 192.168.100.201 ,
>  for the purpose of seeking some information .
> 
> usually , i ssh from catherine_11.4 ; but , ssh "timed out" .
> i tried ping ; there was no response .
> i tried this , both ,
>  as "natalie" , by name , and
>  as "192.168.100.201" , by address .
> 
> well , to get the job done , i got what i needed from the console ,
>  with the intent of researching this behavior , later .
> 
> yesterday , 22_oct_21_fri , "later" arrived .
> i tried a number of things ; here are some observations .
> 
> move the rj45 plug
>  from the re0 socket [ addr .201 ]
>  to   the em0 socket [ addr .202 ] .
> ping and ssh work correctly .
> move the rj45 back to re0 ; no response and "timed out" .
> 
> next , while doing this several times ,
>  after each move , i invoked ifconfig .
> both re0 and em0 report "UP" and "RUNNING" and
>  their mac_ and inet_addresses are correct .
> after each move , "status" and "media" exchange places ,
>  between
>    "no carrier" and "Ethernet autoselect (none)"
>  and
>    "active"     and "Ethernet autoselect (1000baseT <full-duplex>)" .
> the boot messages look ok .
> the rear_panel steady/flashing leds behave as expected .
> i even tried moving the rj45 , for natalie's cable ,
>  to a different port on the network_switch .
> 
> for a_while , i ruminated .
> 
> then , i concluded that "this must be a hardware problem" .
> 
> [ i digress briefly .
>  there is a widely_known "myth" [ is it , really ? ] that
>    electronics_hardware contains a "crisis_detector" circuit .
>  it identifies when the user is in
>    a higher_than_usual state_of_anxiety .
>  then , it implements some "worst_place" , "worst_time" failure .
>  for em0 , which is a plug_in card [ i have several spares ] ,
>    this is a 5_minute swap .
>  because my difficulty is with re0 and my isp ,
>    then , necessarily , the problem is with re0 .
>  re0 is built_in , on the mobo .
>  i have spare parts , for every_thing ;
>    but , swapping moboes is not a 5_minute job .
> ]
> 
> so , earlier today , 22_oct_22_sat , i assembled
>  a new mobo , a new cpu/fan and 2 new dram sticks .
> the case , psu and the em0_nic are the same ;
>  so is the only hd [ w/ 11.3 amd64 on it ] .
> 
> 
> 
> HORRORS !!!
> 
> re0 does not respond on this mobo , either .
> ifconfig and all of the other tests give the same results .
> 
> i have spent more_than_a_few minutes , today ,
>  being slack_jawed , while staring at screens .
> 
> every_thing worked fine , on 22_sep_25_sun ,
>  when i shut_down natalie_11.3 ;
>  the non_response began immediately , with the boot of 22_oct_19_wed
>  [ i had to "catch_up" on other things , in the mean_time ] .
> 
> 
> 
> i am convinced that i changed some_thing , in_advertently ;
>  how_ever , i have no idea what it could be .
> again , i wonder if this could be some_thing , which is non_intuitive .
>
  > at_least , because i intend to build 3 boxen ,
>  assembling today's mobo was not a waste of time ,
>  as the original mobo is still good , probably ;
>  i was just early .
> 
> 
> 
> update 20.15.utc
> 
> i just had one_more idea of some_thing to try .
> 
> shut_down , swap_in the natalie/natasha_12.3 hd and boot .
> 
> EUREKA !!!
> 
> the re0 if --works-- , on natalie , using address .201 ;
>  i pinged and i sshed .
> 
> --now-- , i --know-- that this is --not-- a hardware problem .
> 
> there_fore , the severity level of this problem has been down_graded
>  from "crisis"
>  [ because i can not solve the dhcp_problem w/o 2 ifs ]
>  to "very_important"
>  [ because , if i did what_ever i did once , in_advertently ,
>      then , easily , i could do it again , in_advertently
>  ] .
> 
> if i can learn the cause of this behavior ,
>  then , if it happens again , i will have some idea of what to do .
> so , posting this question remains legitimate .
> 
I assume this is a system you have physical access to, but I got lost in the 
ssh and all the things you tried. Anyway my 2 cents ..

try pciconf -lv I get:

em0@pci0:0:25:0:  class=0x020000 card=0x05a41028 chip=0x153a8086 rev=0x04 
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = 'Ethernet Connection I217-LM'
     class      = network
     subclass   = ethernet

Next /var/run/dmesg.boot, I get:

arp: 192.168.2.1 moved from 00:90:8f:c4:f5:42 to 48:5d:36:88:49:e2 on em0
em0: <Intel(R) I217-LM LPT> port 0xf080-0xf09f mem 
0xf7c00000-0xf7c1ffff,0xf7c3d000-0xf7c3dfff irq 20 at device 25.0 on pci0
em0: EEPROM V0.13-4
em0: Using 1024 TX descriptors and 1024 RX descriptors
em0: Using an MSI interrupt
em0: Ethernet address: f8:b1:56:b7:be:e7
em0: netmap queues/slots: TX 1/1024, RX 1/1024
ahciem0: <AHCI enclosure management bridge> on ahci0
ses0 at ahciem0 bus 0 scbus2 target 0 lun 0
em0: link state changed to UP

from the last boot

The above reminds me, check the arp table. Clearing and trying again might 
help.

If you are using DHCP ..
   killall dhclient
   dhclient em0 (for me)

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

I assume you did this:
options=81249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER>
         ether f8:b1:56:b7:be:e7
         inet 192.168.2.102 netmask 0xffffff00 broadcast 192.168.2.255
         media: Ethernet autoselect (1000baseT <full-duplex>)
         status: active
         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

inet and status.

Other issues I have had on my LAN, switch hangup, power cycle all of them. Bad 
cable (twice since 1997). My Lenovo Ideapad has a really poorly designed 
ethernet connection. Much care in plugging in the cable.

Also try to connect using netstat, route and ifconfig.

I hope this was not too far off the mark, g'luck