From nobody Sun Aug 01 13:36:29 2021 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 3659C12B16F8 for ; Sun, 1 Aug 2021 13:36:47 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from forward101p.mail.yandex.net (forward101p.mail.yandex.net [IPv6:2a02:6b8:0:1472:2741:0:8b7:101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Gd2H14PChz4SZm; Sun, 1 Aug 2021 13:36:45 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from iva1-236c86026a3d.qloud-c.yandex.net (iva1-236c86026a3d.qloud-c.yandex.net [IPv6:2a02:6b8:c0c:928a:0:640:236c:8602]) by forward101p.mail.yandex.net (Yandex) with ESMTP id E78203281F85; Sun, 1 Aug 2021 16:36:35 +0300 (MSK) Received: from iva1-bc1861525829.qloud-c.yandex.net (iva1-bc1861525829.qloud-c.yandex.net [2a02:6b8:c0c:a0e:0:640:bc18:6152]) by iva1-236c86026a3d.qloud-c.yandex.net (mxback/Yandex) with ESMTP id cp8l0woQqm-aZH4qAPM; Sun, 01 Aug 2021 16:36:35 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfw.ru; s=mail; t=1627824995; bh=9kip0hOOfal6sWNeKzxhZjAG6JJQ38z3W4V40l/xSS0=; h=To:References:Date:Subject:Cc:From:Message-Id:In-Reply-To; b=IfH8F9jRO01jslkl9WKMKkKSEwdcRDnJ0lKt9znoRW/4yf68DbK2aCnbXv3jSQZWf IKWxPNiU3xb1RITUsHvPc5TlE54Cpnv/XCNmYq3Zaue2losJFJE6LIOrgmWl0wyJPJ bgEceWu8yAeBxNg/q4eqkur9n/AkdQz/4LXZ2W00= Received: by iva1-bc1861525829.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id C2XHVoX280-aZbu0Stx; Sun, 01 Aug 2021 16:36:35 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) Content-Type: text/plain; charset=us-ascii List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\)) Subject: Re: network crash in nhop_free From: "Alexander V. Chernikov" In-Reply-To: <70d1091d-07ec-1c76-29bc-1f2e2264b55a@FreeBSD.org> Date: Sun, 1 Aug 2021 14:36:29 +0100 Cc: freebsd-net Content-Transfer-Encoding: quoted-printable Message-Id: <869483A6-FA65-40A2-9CCC-05216588EAC8@ipfw.ru> References: <2fbc5205-3fcc-d233-dae1-cf6ddc8d691d@FreeBSD.org> <95F4F779-91A0-482B-B26B-6C95A60FC281@ipfw.ru> <70d1091d-07ec-1c76-29bc-1f2e2264b55a@FreeBSD.org> To: Andriy Gapon X-Mailer: Apple Mail (2.3654.100.0.2.22) X-Rspamd-Queue-Id: 4Gd2H14PChz4SZm X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ipfw.ru header.s=mail header.b=IfH8F9jR; dmarc=none; spf=pass (mx1.freebsd.org: domain of melifaro@ipfw.ru designates 2a02:6b8:0:1472:2741:0:8b7:101 as permitted sender) smtp.mailfrom=melifaro@ipfw.ru X-Spamd-Result: default: False [-3.10 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[ipfw.ru:s=mail]; FREEFALL_USER(0.00)[melifaro]; FROM_HAS_DN(0.00)[]; MV_CASE(0.50)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; R_SPF_ALLOW(-0.20)[+ip6:2a02:6b8:0:1000::/52]; DMARC_NA(0.00)[ipfw.ru]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[ipfw.ru:+]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:13238, ipnet:2a02:6b8::/32, country:RU]; MID_RHS_MATCH_FROM(0.00)[]; MAILMAN_DEST(0.00)[freebsd-net]; RCVD_IN_DNSWL_LOW(-0.10)[2a02:6b8:0:1472:2741:0:8b7:101:from] X-ThisMailContainsUnwantedMimeParts: N > On 10 Jul 2021, at 10:07, Andriy Gapon wrote: >=20 > On 09/07/2021 00:02, Alexander V. Chernikov wrote: >> Hi Andriy, >> Could you by any chance provide a bit more info on the system = networking configuration and the steps leading to panic? >> No chance for a coredump? >> destroy_nhgrp() suggests that there was a multipath route (default?) = that was deleted. >> nhops are created with UMA_ALIGN_PTR, so I suspect there is a garbage = inside nhgrp pointer.. >=20 > I've just reproduced the problem and got a crash dump. > The new panic is a little bit different, but I think that it confirms = your analysis. > Also, you are right about the multipath route, although its creation = was not intentional. Should be fixed by = https://cgit.freebsd.org/src/commit/?id=3D054948bd81bb9e4e32449cf351b62e50= 1b8831ff . >=20 > The test setup is a host with an ethernet interface and a 3g modem = (for ppp). > The default default route is via the ethernet. >=20 > Destination Gateway Flags Netif Expire > default 192.168.0.1 UGS dwc0 > 8.8.8.8 192.168.0.1 UGHS dwc0 > 127.0.0.1 link#2 UH lo0 > 192.168.0.0/24 link#1 U dwc0 > 192.168.0.137 link#1 UHS lo0 >=20 > 192.168.0.0/24 is the LAN. > The static route to 8.8.8.8 is for internet accessibility checking. >=20 > Interesting bits of my ppp configuration: > ----- ppp.linkup ----- > 3g: > add! default HISADDR > ---------------------- >=20 > When I bring up the ppp link I get two default routes -- which is not = what I expected even when using 'add!': > Destination Gateway Flags Netif Expire > default 192.168.0.1 UGS dwc0 > default 10.1.1.1 UGS tun0 > 8.8.8.8 192.168.0.1 UGHS dwc0 > 10.1.1.1 link#4 UHS tun0 > 10.133.147.118 link#4 UHS lo0 > 127.0.0.1 link#2 UH lo0 > 192.168.0.0/24 link#1 U dwc0 > 192.168.0.137 link#1 UHS lo0 >=20 > The procedure to re-create the problem is two bring up and down the = ppp link twice. That is, up -> down -> up -> down -> crash. >=20 > Now, about the new crash. > The panic message is: > panic: refcount 0xffffa00027813318 wraparound >=20 > The stack trace is approximately the same: > panic() at panic+0x44 > _refcount_update_saturated() at _refcount_update_saturated+0x14 > nhop_free() at nhop_free+0x118 > destroy_nhgrp() at destroy_nhgrp+0x38 > epoch_call_task() at epoch_call_task+0x158 > gtaskqueue_run_locked() at gtaskqueue_run_locked+0x178 > gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x9c > fork_exit() at fork_exit+0x74 > fork_trampoline() at fork_trampoline+0x14 >=20 > =46rom kgdb it seems like a refcount underflow (decrement from zero). > (kgdb) p *nhg_priv > $1 =3D {nhg_idx =3D 0, nhg_nh_count =3D 2 '\002', nhg_spare =3D = "\000\000", nhg_refcount =3D 0, nhg_linked =3D 1, nh_control =3D 0x0, = nhg_priv_next =3D 0x0, nhg =3D 0xffffa00032049e80, nhg_epoch_ctx =3D = {data =3D { > 0xffff0000005a0edc , 0xffffa0000eecb148}}, = nhg_nh_weights =3D 0xffffa00032049ed0} > (kgdb) p nhg_priv->nhg_nh_weights[0] > $2 =3D {nh =3D 0xffffa00027813200, weight =3D 0} > (kgdb) p nhg_priv->nhg_nh_weights[1] > $3 =3D {nh =3D 0xffffa00027813800, weight =3D 1} > (kgdb) p *nhg_priv->nhg_nh_weights[0].nh > $4 =3D {nh_flags =3D 128, nh_mtu =3D 1500, {gw4_sa =3D {sin_len =3D 16 = '\020', sin_family =3D 2 '\002', sin_port =3D 0, sin_addr =3D {s_addr =3D = 16843018}, sin_zero =3D "\000\000\000\000\000\000\000"}, gw6_sa =3D = {sin6_len =3D 16 '\020', > sin6_family =3D 2 '\002', sin6_port =3D 0, sin6_flowinfo =3D = 16843018, sin6_addr =3D {__u6_addr =3D {__u6_addr8 =3D '\000' , __u6_addr16 =3D {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 =3D {0, = 0, 0, 0}}}, sin6_scope_id =3D 0}, > gw_sa =3D {sa_len =3D 16 '\020', sa_family =3D 2 '\002', sa_data =3D = "\000\000\n\001\001\001\000\000\000\000\000\000\000"}, gwl_sa =3D = {sdl_len =3D 16 '\020', sdl_family =3D 2 '\002', sdl_index =3D 0, = sdl_type =3D 10 '\n', > sdl_nlen =3D 1 '\001', sdl_alen =3D 1 '\001', sdl_slen =3D 1 = '\001', sdl_data =3D "\000\000\000\000\000\000\000"}, gw_buf =3D = "\020\002\000\000\n\001\001\001", '\000' }, nh_ifp =3D = 0xffffa00027843800, > nh_ifa =3D 0xffffa0000eec4900, nh_aifp =3D 0xffffa00027843800, = nh_pksent =3D 0xffff0000c2d38cd8, nh_prepend_len =3D 0 '\000', spare =3D = "\000\000", spare1 =3D 0, nh_prepend =3D '\000' , = nh_priv =3D 0xffffa00027813300} > (kgdb) p *nhg_priv->nhg_nh_weights[1].nh > $5 =3D {nh_flags =3D 640, nh_mtu =3D 1500, {gw4_sa =3D {sin_len =3D 16 = '\020', sin_family =3D 2 '\002', sin_port =3D 0, sin_addr =3D {s_addr =3D = 16843018}, sin_zero =3D "\000\000\000\000\000\000\000"}, gw6_sa =3D = {sin6_len =3D 16 '\020', > sin6_family =3D 2 '\002', sin6_port =3D 0, sin6_flowinfo =3D = 16843018, sin6_addr =3D {__u6_addr =3D {__u6_addr8 =3D '\000' , __u6_addr16 =3D {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 =3D {0, = 0, 0, 0}}}, sin6_scope_id =3D 0}, > gw_sa =3D {sa_len =3D 16 '\020', sa_family =3D 2 '\002', sa_data =3D = "\000\000\n\001\001\001\000\000\000\000\000\000\000"}, gwl_sa =3D = {sdl_len =3D 16 '\020', sdl_family =3D 2 '\002', sdl_index =3D 0, = sdl_type =3D 10 '\n', > sdl_nlen =3D 1 '\001', sdl_alen =3D 1 '\001', sdl_slen =3D 1 = '\001', sdl_data =3D "\000\000\000\000\000\000\000"}, gw_buf =3D = "\020\002\000\000\n\001\001\001", '\000' }, nh_ifp =3D = 0xffffa00027843800, > nh_ifa =3D 0xffffa0000eec4900, nh_aifp =3D 0xffffa00027843800, = nh_pksent =3D 0xffff0000c2d38430, nh_prepend_len =3D 0 '\000', spare =3D = "\000\000", spare1 =3D 0, nh_prepend =3D '\000' , = nh_priv =3D 0xffffa00027813900} >=20 > (kgdb) p *nhg_priv->nhg_nh_weights[0].nh->nh_priv > $7 =3D {nh_family =3D 2 '\002', spare =3D 0 '\000', nh_type =3D 2, = rt_flags =3D 526336, nh_idx =3D 0, cb_func =3D 0x0, nh_refcnt =3D = 4294967295, nh_linked =3D 1, nh =3D 0xffffa00027813200, nh_control =3D = 0xffffa00000ddf900, > nh_next =3D 0xffffa00027813900, nh_vnet =3D 0xffffa0000084c580, = nh_epoch_ctx =3D {data =3D {0xffff0000005a2f90 , = 0x0}}} > (kgdb) p *nhg_priv->nhg_nh_weights[1].nh->nh_priv > $8 =3D {nh_family =3D 2 '\002', spare =3D 0 '\000', nh_type =3D 2, = rt_flags =3D 2050, nh_idx =3D 11, cb_func =3D 0x0, nh_refcnt =3D 4, = nh_linked =3D 2, nh =3D 0xffffa00027813800, nh_control =3D = 0xffffa00000ddf900, nh_next =3D 0xffffa00027813500, > nh_vnet =3D 0xffffa0000084c580, nh_epoch_ctx =3D {data =3D {0x0, = 0x0}}} >=20 > nh_refcnt =3D 4294967295 (0xffffffff) in = nhg_priv->nhg_nh_weights[0].nh->nh_priv. >=20 >>> On 22 Jun 2021, at 11:31, Andriy Gapon wrote: >>>=20 >>>=20 >>> It seems that the panic message was >>> panic: Misaligned access from kernel space! >>>=20 >>> On 22/06/2021 12:54, Andriy Gapon wrote: >>>> Not sure if I'll be able to get more out of this arm64 machine. >>>> I was playing with ppp and switching routes between LAN and ppp = when the crash happened. >>>> The system is 2-3 weeks old 14.0-CURRENT as of c8250c5ada85fec. >>>> Tracing pid 0 tid 100014 td 0xffffa00000c00000 >>>> db_trace_self() at db_trace_self >>>> db_stack_trace() at db_stack_trace+0x11c >>>> db_command() at db_command+0x244 >>>> db_command_loop() at db_command_loop+0x54 >>>> db_trap() at db_trap+0xf8 >>>> kdb_trap() at kdb_trap+0x1c4 >>>> handle_el1h_sync() at handle_el1h_sync+0x74 >>>> --- exception, esr 0xf2000000 >>>> kdb_enter() at kdb_enter+0x44 >>>> vpanic() at vpanic+0x1c4 >>>> panic() at panic+0x44 >>>> align_abort() at align_abort+0xb8 >>>> handle_el1h_sync() at handle_el1h_sync+0x74 >>>> --- exception, esr 0x96000021 >>>> nhop_free() at nhop_free+0x100 >>>> destroy_nhgrp() at destroy_nhgrp+0x38 >>>> epoch_call_task() at epoch_call_task+0x158 >>>> gtaskqueue_run_locked() at gtaskqueue_run_locked+0x178 >>>> gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x9c >>>> fork_exit() at fork_exit+0x74 >>>> fork_trampoline() at fork_trampoline+0x14 >>>=20 >>>=20 >>> --=20 >>> Andriy Gapon >>>=20 >=20 >=20 > --=20 > Andriy Gapon