[Bug 213410] [carp] service netif restart causes hang only when carp is enabled
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Wed Oct 12 08:16:24 UTC 2016
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213410
Bug ID: 213410
Summary: [carp] service netif restart causes hang only when
carp is enabled
Product: Base System
Version: 11.0-STABLE
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: bin
Assignee: freebsd-bugs at FreeBSD.org
Reporter: dch at skunkwerks.at
Created attachment 175654
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=175654&action=edit
dmesg
# steps
FreeBSD 11.0Rp1 amd64
- dmesg attached
- ifconfig (IPs masked)
igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu
1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 78:45:c4:fa:d2:12
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu
1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 78:45:c4:fa:d2:12
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu
1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 78:45:c4:fa:d2:12
inet 10.0.9.83 netmask 0xfffffff0 broadcast 10.0.9.95
inet 10.0.9.84 netmask 0xffffffff broadcast 10.0.9.84 vhid 1
inet 10.0.9.85 netmask 0xffffffff broadcast 10.0.9.85 vhid 3
inet6 fe80::7a45:c4ff:fefa:d212%lagg0 prefixlen 64 scopeid 0x4
inet6 3000:3050:3000:4::83 prefixlen 64
inet6 3000:3050:3000:4::84 prefixlen 64 vhid 2
inet6 3000:3050:3000:4::85 prefixlen 64 vhid 4
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
carp: BACKUP vhid 1 advbase 1 advskew 100
carp: BACKUP vhid 3 advbase 1 advskew 0
carp: BACKUP vhid 2 advbase 1 advskew 100
carp: BACKUP vhid 4 advbase 1 advskew 0
groups: lagg
laggproto lacp lagghash l2,l3,l4
laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
issue `service netif restart`
This was initially done via net/mosh connection and tmux inside that,
but repeated again with direct console access (KVM remote mgmt tool).
## actual results
the system hangs, 100% reproducible.
- no keyboard entry
- no ability to Alt-F3 to switch tabs
- no ping over network
- a hard reboot is required to regain control
- final message in log appears to be
Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN
### console
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_stop
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lo0: 48
### /var/log/messages
Oct 12 08:00:00 bridget newsyslog[1525]: logfile turned over due to size>100K
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_stop
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lo0: 48
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_gateway_enable is set to NO.
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget kernel: carp: 2 at lagg0: BACKUP -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget kernel: carp: 4 at lagg0: MASTER -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget last message repeated 3 times
Oct 12 08:01:21 bridget kernel: carp: 1 at lagg0: BACKUP -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget last message repeated 2 times
Oct 12 08:01:21 bridget kernel: carp: 3 at lagg0: MASTER -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode disabled
Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode disabled
Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode disabled
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: The following interfaces
were not configured:
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed wlan(4)s:
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
cloned_interfaces_sticky is set to NO.
Oct 12 08:01:21 bridget kernel: lagg0: link state changed to DOWN
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed clones: lagg0
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_start
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Created wlan(4)s:
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Cloned: lagg0
Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command:
start_precmd: checkauto
Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command: doit:
pccard_ether_start
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_start lagg0
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Created wlan(4)s:
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Cloned:
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:21 bridget kernel: lagg0: link state changed to UP
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_gateway_enable is set to NO.
Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode enabled
Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode enabled
Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode enabled
Oct 12 08:01:21 bridget kernel: igb0: link state changed to DOWN
Oct 12 08:01:21 bridget kernel: carp: 1 at lagg0: INIT -> BACKUP (initialization
complete)
Oct 12 08:01:21 bridget kernel: carp: 3 at lagg0: INIT -> BACKUP (initialization
complete)
Oct 12 08:01:21 bridget kernel: carp: 2 at lagg0: INIT -> BACKUP (initialization
complete)
Oct 12 08:01:21 bridget kernel: carp: 4 at lagg0: INIT -> BACKUP (initialization
complete)
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:22 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:22 bridget kernel: igb1: link state changed to DOWN
Oct 12 08:01:22 bridget kernel: carp: 1 at lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 240 (interface down)
Oct 12 08:01:22 bridget kernel: carp: 3 at lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 480 (interface down)
Oct 12 08:01:22 bridget kernel: carp: 2 at lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 720 (interface down)
Oct 12 08:01:22 bridget kernel: carp: 4 at lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 960 (interface down)
Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN
Oct 12 08:01:24 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: rc_startmsgs
is set to YES.
# expected results
after a short period of downtime, the network is re-established.
# notes
if carp config is disabled, and system is rebooted, this functions as expected.
# config
```
# /etc/rc.conf on 1st node
hostname="one.my.domain"
ifconfig_igb0="up"
ifconfig_igb1="up"
cloned_interfaces="lagg0"
ifconfig_lagg0="inet 10.0.9.82 netmask 255.255.255.240 laggproto lacp laggport
igb0 laggport igb1"
ifconfig_lagg0_ipv6="inet6 3000:3050:3000:4::82/64"
# ifconfig_lo1="inet 10.0.0.254 netmask 255.255.255.0"
defaultrouter="10.0.9.81"
ipv6_defaultrouter="3000:3050:3000:4::1"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
zfs_enable="YES"
# carp on
kld_list="carp"
ifconfig_lagg0_aliases="\
inet vhid 1 advskew 0 pass pwd1 10.0.9.84/32 \
inet6 vhid 2 advskew 0 pass pwd2 3000:3050:3000:4::84/64 \
inet vhid 3 advskew 100 pass pwd3 10.0.9.85/32 \
inet6 vhid 4 advskew 100 pass pwd4 3000:3050:3000:4::85/64"
# debugging rc.d scripts
rc_debug="YES"
rc_startmsgs="YES"
```
```
# /etc/rc.conf on 2nd node
hostname="two.my.domain"
ifconfig_igb0="up"
ifconfig_igb1="up"
cloned_interfaces="lagg0"
ifconfig_lagg0="inet 10.0.9.83 netmask 255.255.255.240 laggproto lacp laggport
igb0 laggport igb1"
ifconfig_lagg0_ipv6="inet6 3000:3050:3000:4::83/64"
defaultrouter="10.0.9.81"
ipv6_defaultrouter="3000:3050:3000:4::1"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
zfs_enable="YES"
# carp on
kld_list="carp"
ifconfig_lagg0_aliases="\
inet vhid 1 advskew 100 pass pwd1 10.0.9.84/32 \
inet6 vhid 2 advskew 100 pass pwd2 3000:3050:3000:4::84/64 \
inet vhid 3 advskew 0 pass pwd3 10.0.9.85/32 \
inet6 vhid 4 advskew 0 pass pwd4 3000:3050:3000:4::85/64"
# debugging rc.d scripts
rc_debug="YES"
rc_startmsgs="YES"
```
```
# /boot/loader.conf
/boot/loader.conf
# storage
# zfs won't start mounting volumes without this
zfs_load="YES"
kern.geom.label.gptid.enable="0"
# hardware
coretemp_load="YES"
# console
# ensure console in IPMI mode remains accessible instead of going all white
hw.vga.textmode=1
# bhyve and jails
vmm_load="YES"
nmdm_load="YES"
if_bridge_load="YES"
if_tap_load="YES"
kern.racct.enable=1
# debug super powers
dtraceall_load="YES"
# runtime
# maxfiles
kern.maxfiles="25000"
# network
# fibs
# https://blog.feld.me/posts/2015/06/routing-a-freebsd-jail-through-openvpn/
# https://www.freebsd.org/cgi/man.cgi?query=setfib
net.fibs=2
# from https://calomel.org/freebsd_network_tuning.html
accf_data_load="YES"
accf_dns_load="YES"
autoboot_delay="3"
ahci_load="YES"
aio_load="YES"
cc_htcp_load="YES"
net.tcp.hostcache.cachelimit="0"
```
```
# /etc/sysctl.conf
# carp tweaks
net.inet.carp.preempt=1
```
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list