From nobody Sat Oct 02 00:10:26 2021 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 2613817D9C4A for ; Sat, 2 Oct 2021 00:10:26 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HLnS20Tdhz3mlm for ; Sat, 2 Oct 2021 00:10:26 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id E107FDCD for ; Sat, 2 Oct 2021 00:10:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 1920AP5s010531 for ; Sat, 2 Oct 2021 00:10:25 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 1920AP0C010530 for bugs@FreeBSD.org; Sat, 2 Oct 2021 00:10:25 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 258850] lagg failover crashes and burns out with em and ath Date: Sat, 02 Oct 2021 00:10:26 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 13.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: john.westbrook@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D258850 Bug ID: 258850 Summary: lagg failover crashes and burns out with em and ath Product: Base System Version: 13.0-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: john.westbrook@gmail.com I am having significant problems on FreeBSD 13.0 using lagg-failover with e= m0 and wlan0/ath0 on both my ThinkPad X220 and X230. Both laptops are running Coreboot, with a Dell 7WCGT Bigfoot Killer Wireless (AR5BHB112; AR9380 chipset). Both em0 and wlan0/ath0 work fine when not used with lagg. This problem has some similarities to bug #226549 but can't be recovered in= the same way. The basic symptom is that the lagg0 interface often vanishes when both lagg= port interfaces are inactive/unassociated--for example, (1) when not connected to wired ethernet and the WiFi interface loses its association with the WiFi access point, or (2) when unplugging from the wired network. This also often happens at boot, when the lagg0 interface comes up but WiFi hasn't establis= hed an association with the WiFi access point. Looking in dmesg after boot does= n't shed much light: lagg0: link state changed to DOWN lagg0: link state changed to UP lagg0: link state changed to DOWN However, the problem isn't limited to WiFi. The problem also occurs when failing over from wired. Once em0 goes down (i.e. cable unplugged, or ifcon= fig down), it can't be brought back up, even separate from lagg0: # ifconfig em0 em0: flags=3D8c22 metric 0 mtu 1500 options=3D800000<> ether XX:XX:XX:XX:XX:XX media: Ethernet autoselect (1000baseT ) status: active nd6 options=3D29 # ifconfig em0 up # ifconfig em0 em0: flags=3D8c22 metric 0 mtu 1500 options=3D800000<> ether XX:XX:XX:XX:XX:XX media: Ethernet autoselect status: no carrier nd6 options=3D29 # ifconfig em0 em0: flags=3D8c22 metric 0 mtu 1500 options=3D800000<> ether XX:XX:XX:XX:XX:XX media: Ethernet autoselect (1000baseT ) status: active nd6 options=3D29 Here's my lagg configuration--almost identical to the man page: wlans_ath0=3D"wlan0" ifconfig_wlan0=3D"WPA" ifconfig_em0=3D"up" cloned_interfaces=3D"lagg0" ifconfig_lagg0=3D"up laggproto failover laggport em0 laggport wlan0 DHCP" except that I'm setting the MAC address via a hint in /boot/loader.conf: hint.ath.0.macaddr=3D"XX:XX:XX:XX:XX:XX" I used the hint based on past threads discussing problems associated with setting the MAC address on Atheros devices. However, it doesn't seem to mak= e a difference with the problem if I instead override the MAC address on em0 wi= th the MAC address from the Atheros card. Also, the problem with lagg0 happens both when using DHCP and when configured to use a static IP address. When not connected to wired ethernet, and when the WiFi interface stabilizes/associates, reconfiguring lagg0 from the command line is flaky. Sometimes it works, sometimes not. Sometimes ifconfig shows lagg0 along wit= h a device-not-configured error, followed by lagg0 vanishing: # ifconfig wlan0 down # ifconfig em0: flags=3D8c23 metric 0 mtu 1500= =20=20=20=20=20=20 options=3D481249b ether XX:XX:XX:XX:XX:XX media: Ethernet autoselect status: no carrier nd6 options=3D29 lo0: flags=3D8049 metric 0 mtu 16384 options=3D680003 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 inet 127.0.0.1 netmask 0xff000000 groups: lo nd6 options=3D21 wlan0: flags=3D8802 metric 0 mtu 1500 ether XX:XX:XX:XX:XX:XX groups: wlan ssid "" channel 1 (2412 MHz 11g ht/20) regdomain 106 indoor ecm authmode WPA2/802.11i privacy ON deftxkey UNDEF AES-CCM 2:128-bit txpower 20 bmiss 7 scanvalid 60 protmode CTS ampdulimit 64k ampdudensity 8 shortgi -uapsd wme burst roaming MANUAL parent interface: ath0 media: IEEE 802.11 Wireless Ethernet autoselect (autoselect) status: no carrier nd6 options=3D29 pflog0: flags=3D141 metric 0 mtu 33160 groups: pflog lagg0: flags=3D8802 ether XX:XX:XX:XX:XX:XX ifconfig: SIOCGIFGROUP: Device not configured # ifconfig lagg0 create # ifconfig lagg0 up laggproto failover laggport wlan0 laggport em0 # ifconfig em0: flags=3D8c22 metric 0 mtu 1500 options=3D481249b ether XX:XX:XX:XX:XX:XX media: Ethernet autoselect status: no carrier nd6 options=3D29 lo0: flags=3D8049 metric 0 mtu 16384 options=3D680003 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 inet 127.0.0.1 netmask 0xff000000 groups: lo nd6 options=3D21 wlan0: flags=3D8802 metric 0 mtu 1500 ether XX:XX:XX:XX:XX:XX groups: wlan ssid "" channel 1 (2412 MHz 11g ht/20) regdomain 106 indoor ecm authmode WPA2/802.11i privacy ON deftxkey UNDEF AES-CCM 2:128-bit txpower 20 bmiss 7 scanvalid 60 protmode CTS ampdulimit 64k ampdudensity 8 shortgi -uapsd wme burst roaming MANUAL parent interface: ath0 media: IEEE 802.11 Wireless Ethernet autoselect (autoselect) status: no carrier nd6 options=3D29 pflog0: flags=3D141 metric 0 mtu 33160 groups: pflog lagg0: flags=3D8802 ether XX:XX:XX:XX:XX:XX ifconfig: SIOCGIFGROUP: Device not configured Repeating the same operations sometimes yields success. I wrote a script th= at helps with making sense the sequence in /var/log/messages: #!/bin/sh tag=3D`basename "$0"` logger -t "$tag" "Checking lagg0 ..." if ifconfig lagg0; then logger -t "$tag" "lagg0 exists." exit 0 fi logger -t "$tag" "Creating lagg0 ..." if ifconfig lagg0 create; then logger -t "$tag" "lagg0 create success." else logger -t "$tag" "lagg0 create failed." exit 1 fi logger -t "$tag" "Configuring lagg0 ..." params=3D`sysrc -n ifconfig_lagg0 | sed s/DHCP/up/` if ifconfig lagg0 $params; then logger -t "$tag" "lagg0 config success." else logger -t "$tag" "lagg0 config failed: $params" exit 2 fi logger -t "$tag" "Postcheck(0) lagg0 ..." if ifconfig lagg0; then logger -t "$tag" "lagg0 postcheck success." else logger -t "$tag" "lagg0 postcheck failed." exit 3 fi sleep 10 logger -t "$tag" "Postcheck(1) lagg0 ..." if ifconfig lagg0; then logger -t "$tag" "lagg0 postcheck success." else logger -t "$tag" "lagg0 postcheck failed." exit 4 fi sleep 20 logger -t "$tag" "Postcheck(2) lagg0 ..." if ifconfig lagg0; then logger -t "$tag" "lagg0 postcheck success." else logger -t "$tag" "lagg0 postcheck failed." exit 5 fi Here's an example of when the script succeeds: Oct 1 10:27:08 x220a fix-lagg0[6783]: Checking lagg0 ... Oct 1 10:27:08 x220a fix-lagg0[6788]: Creating lagg0 ... Oct 1 10:27:08 x220a fix-lagg0[6793]: lagg0 create success. Oct 1 10:27:08 x220a fix-lagg0[6797]: Configuring lagg0 ... Oct 1 10:27:09 x220a wpa_supplicant[347]: wlan0: CTRL-EVENT-DISCONNECTED bssid=3DAA:AA:AA:AA:AA:AA reason=3D3 locally_generated=3D1 Oct 1 10:27:09 x220a kernel: lagg0: link state changed to DOWN Oct 1 10:27:09 x220a kernel: wlan0: link state changed to DOWN Oct 1 10:27:10 x220a fix-lagg0[6822]: lagg0 config success. Oct 1 10:27:10 x220a fix-lagg0[6826]: Postcheck(0) lagg0 ... Oct 1 10:27:10 x220a fix-lagg0[6831]: lagg0 postcheck success. Oct 1 10:27:16 x220a wpa_supplicant[347]: wlan0: Trying to associate with AA:AA:AA:AA:AA:AA (SSID=3D'FiOS-YLLQU-5G' freq=3D5765 MHz) Oct 1 10:27:16 x220a kernel: ath0: ath_edma_recv_tasklet: sc_inreset_cnt >= 0; skipping Oct 1 10:27:16 x220a wpa_supplicant[347]: Failed to add supported operating classes IE Oct 1 10:27:16 x220a wpa_supplicant[347]: ioctl[SIOCS80211, op=3D20, val= =3D0, arg_len=3D7]: Can't assign requested address Oct 1 10:27:16 x220a wpa_supplicant[347]: wlan0: Associated with AA:AA:AA:AA:AA:AA Oct 1 10:27:16 x220a kernel: wlan0: ieee80211_new_state_locked: pending AU= TH -> ASSOC transition lost Oct 1 10:27:16 x220a kernel: wlan0: ieee80211_new_state_locked: pending AS= SOC -> RUN transition lost Oct 1 10:27:16 x220a kernel: wlan0: link state changed to UP Oct 1 10:27:16 x220a kernel: lagg0: link state changed to UP Oct 1 10:27:16 x220a wpa_supplicant[347]: wlan0: WPA: Key negotiation completed with AA:AA:AA:AA:AA:AA [PTK=3DCCMP GTK=3DCCMP] Oct 1 10:27:16 x220a wpa_supplicant[347]: wlan0: CTRL-EVENT-CONNECTED - Connection to AA:AA:AA:AA:AA:AA completed [id=3D0 id_str=3D] Oct 1 10:27:20 x220a fix-lagg0[6852]: Postcheck(1) lagg0 ... Oct 1 10:27:20 x220a fix-lagg0[6857]: lagg0 postcheck success. Oct 1 10:27:50 x220a fix-lagg0[6878]: Postcheck(2) lagg0 ... Oct 1 10:27:50 x220a fix-lagg0[6883]: lagg0 postcheck success. Oct 1 10:27:51 x220a dhclient[6935]: New IP Address (lagg0): 192.168.1.86 Oct 1 10:27:52 x220a dhclient[6939]: New Subnet Mask (lagg0): 255.255.255.0 Oct 1 10:27:52 x220a dhclient[6943]: New Broadcast Address (lagg0): 192.168.1.255 Oct 1 10:27:52 x220a dhclient[6947]: New Routers (lagg0): 192.168.1.1 Notice that adding wlan0 as a laggport brings wlan0 down and triggers a reassociation. Destroying lagg0 also takes down wlan0 and triggers a reassociation: Oct 1 10:32:30 x220a wpa_supplicant[347]: wlan0: CTRL-EVENT-DISCONNECTED bssid=3DAA:AA:AA:AA:AA:AA reason=3D3 locally_generated=3D1 Oct 1 10:32:33 x220a kernel: wlan0: link state changed to DOWN Oct 1 10:32:33 x220a kernel: lagg0: link state changed to DOWN Oct 1 10:32:33 x220a dhclient[6925]: Interface lagg0 is down, dhclient exi= ting Oct 1 10:32:33 x220a dhclient[6925]: connection closed Oct 1 10:32:33 x220a dhclient[6925]: exiting. Oct 1 10:32:33 x220a root[7331]: /etc/rc.d/netif: WARNING: lagg0 does not exist. Skipped. Oct 1 10:32:40 x220a wpa_supplicant[347]: wlan0: Trying to associate with AA:AA:AA:AA:AA:AA (SSID=3D'FiOS-YLLQU-5G' freq=3D5765 MHz) Oct 1 10:32:40 x220a wpa_supplicant[347]: Failed to add supported operating classes IE Oct 1 10:32:40 x220a wpa_supplicant[347]: ioctl[SIOCS80211, op=3D20, val= =3D0, arg_len=3D7]: Can't assign requested address Oct 1 10:32:50 x220a wpa_supplicant[347]: wlan0: Authentication with AA:AA:AA:AA:AA:AA timed out. Oct 1 10:32:50 x220a wpa_supplicant[347]: wlan0: CTRL-EVENT-DISCONNECTED bssid=3DAA:AA:AA:AA:AA:AA reason=3D3 locally_generated=3D1 Oct 1 10:32:57 x220a wpa_supplicant[347]: wlan0: Trying to associate with AA:AA:AA:AA:AA:AA (SSID=3D'FiOS-YLLQU-5G' freq=3D5765 MHz) Oct 1 10:32:57 x220a wpa_supplicant[347]: Failed to add supported operating classes IE Oct 1 10:32:57 x220a wpa_supplicant[347]: wlan0: Associated with AA:AA:AA:AA:AA:AA Oct 1 10:32:57 x220a kernel: wlan0: link state changed to UP Oct 1 10:32:57 x220a wpa_supplicant[347]: wlan0: WPA: Key negotiation completed with AA:AA:AA:AA:AA:AA [PTK=3DCCMP GTK=3DCCMP] Oct 1 10:32:57 x220a wpa_supplicant[347]: wlan0: CTRL-EVENT-CONNECTED - Connection to AA:AA:AA:AA:AA:AA completed [id=3D0 id_str=3D] The transcripts above are from my X220, but I've had the same symptoms on my X230. Given that the problem happens on two machines and impacts both laggp= ort interfaces (em0 and WiFi), it seems like a lagg-related issue. --=20 You are receiving this mail because: You are the assignee for the bug.=