From nobody Sat Jul 10 09:07:30 2021 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 045E811E218E for ; Sat, 10 Jul 2021 09:07:34 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GMPLY6qV3z4Skt; Sat, 10 Jul 2021 09:07:33 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from [192.168.0.88] (unknown [195.64.148.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: avg/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 7E00724CDC; Sat, 10 Jul 2021 09:07:33 +0000 (UTC) (envelope-from avg@FreeBSD.org) From: Andriy Gapon To: "Alexander V. Chernikov" Cc: freebsd-net References: <2fbc5205-3fcc-d233-dae1-cf6ddc8d691d@FreeBSD.org> <95F4F779-91A0-482B-B26B-6C95A60FC281@ipfw.ru> Subject: Re: network crash in nhop_free Message-ID: <70d1091d-07ec-1c76-29bc-1f2e2264b55a@FreeBSD.org> Date: Sat, 10 Jul 2021 12:07:30 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:78.0) Gecko/20100101 Firefox/78.0 Thunderbird/78.11.0 List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 In-Reply-To: <95F4F779-91A0-482B-B26B-6C95A60FC281@ipfw.ru> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-ThisMailContainsUnwantedMimeParts: N On 09/07/2021 00:02, Alexander V. Chernikov wrote: > Hi Andriy, > > Could you by any chance provide a bit more info on the system networking configuration and the steps leading to panic? > No chance for a coredump? > > destroy_nhgrp() suggests that there was a multipath route (default?) that was deleted. > nhops are created with UMA_ALIGN_PTR, so I suspect there is a garbage inside nhgrp pointer.. I've just reproduced the problem and got a crash dump. The new panic is a little bit different, but I think that it confirms your analysis. Also, you are right about the multipath route, although its creation was not intentional. The test setup is a host with an ethernet interface and a 3g modem (for ppp). The default default route is via the ethernet. Destination Gateway Flags Netif Expire default 192.168.0.1 UGS dwc0 8.8.8.8 192.168.0.1 UGHS dwc0 127.0.0.1 link#2 UH lo0 192.168.0.0/24 link#1 U dwc0 192.168.0.137 link#1 UHS lo0 192.168.0.0/24 is the LAN. The static route to 8.8.8.8 is for internet accessibility checking. Interesting bits of my ppp configuration: ----- ppp.linkup ----- 3g: add! default HISADDR ---------------------- When I bring up the ppp link I get two default routes -- which is not what I expected even when using 'add!': Destination Gateway Flags Netif Expire default 192.168.0.1 UGS dwc0 default 10.1.1.1 UGS tun0 8.8.8.8 192.168.0.1 UGHS dwc0 10.1.1.1 link#4 UHS tun0 10.133.147.118 link#4 UHS lo0 127.0.0.1 link#2 UH lo0 192.168.0.0/24 link#1 U dwc0 192.168.0.137 link#1 UHS lo0 The procedure to re-create the problem is two bring up and down the ppp link twice. That is, up -> down -> up -> down -> crash. Now, about the new crash. The panic message is: panic: refcount 0xffffa00027813318 wraparound The stack trace is approximately the same: panic() at panic+0x44 _refcount_update_saturated() at _refcount_update_saturated+0x14 nhop_free() at nhop_free+0x118 destroy_nhgrp() at destroy_nhgrp+0x38 epoch_call_task() at epoch_call_task+0x158 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x178 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x9c fork_exit() at fork_exit+0x74 fork_trampoline() at fork_trampoline+0x14 From kgdb it seems like a refcount underflow (decrement from zero). (kgdb) p *nhg_priv $1 = {nhg_idx = 0, nhg_nh_count = 2 '\002', nhg_spare = "\000\000", nhg_refcount = 0, nhg_linked = 1, nh_control = 0x0, nhg_priv_next = 0x0, nhg = 0xffffa00032049e80, nhg_epoch_ctx = {data = { 0xffff0000005a0edc , 0xffffa0000eecb148}}, nhg_nh_weights = 0xffffa00032049ed0} (kgdb) p nhg_priv->nhg_nh_weights[0] $2 = {nh = 0xffffa00027813200, weight = 0} (kgdb) p nhg_priv->nhg_nh_weights[1] $3 = {nh = 0xffffa00027813800, weight = 1} (kgdb) p *nhg_priv->nhg_nh_weights[0].nh $4 = {nh_flags = 128, nh_mtu = 1500, {gw4_sa = {sin_len = 16 '\020', sin_family = 2 '\002', sin_port = 0, sin_addr = {s_addr = 16843018}, sin_zero = "\000\000\000\000\000\000\000"}, gw6_sa = {sin6_len = 16 '\020', sin6_family = 2 '\002', sin6_port = 0, sin6_flowinfo = 16843018, sin6_addr = {__u6_addr = {__u6_addr8 = '\000' , __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, gw_sa = {sa_len = 16 '\020', sa_family = 2 '\002', sa_data = "\000\000\n\001\001\001\000\000\000\000\000\000\000"}, gwl_sa = {sdl_len = 16 '\020', sdl_family = 2 '\002', sdl_index = 0, sdl_type = 10 '\n', sdl_nlen = 1 '\001', sdl_alen = 1 '\001', sdl_slen = 1 '\001', sdl_data = "\000\000\000\000\000\000\000"}, gw_buf = "\020\002\000\000\n\001\001\001", '\000' }, nh_ifp = 0xffffa00027843800, nh_ifa = 0xffffa0000eec4900, nh_aifp = 0xffffa00027843800, nh_pksent = 0xffff0000c2d38cd8, nh_prepend_len = 0 '\000', spare = "\000\000", spare1 = 0, nh_prepend = '\000' , nh_priv = 0xffffa00027813300} (kgdb) p *nhg_priv->nhg_nh_weights[1].nh $5 = {nh_flags = 640, nh_mtu = 1500, {gw4_sa = {sin_len = 16 '\020', sin_family = 2 '\002', sin_port = 0, sin_addr = {s_addr = 16843018}, sin_zero = "\000\000\000\000\000\000\000"}, gw6_sa = {sin6_len = 16 '\020', sin6_family = 2 '\002', sin6_port = 0, sin6_flowinfo = 16843018, sin6_addr = {__u6_addr = {__u6_addr8 = '\000' , __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, gw_sa = {sa_len = 16 '\020', sa_family = 2 '\002', sa_data = "\000\000\n\001\001\001\000\000\000\000\000\000\000"}, gwl_sa = {sdl_len = 16 '\020', sdl_family = 2 '\002', sdl_index = 0, sdl_type = 10 '\n', sdl_nlen = 1 '\001', sdl_alen = 1 '\001', sdl_slen = 1 '\001', sdl_data = "\000\000\000\000\000\000\000"}, gw_buf = "\020\002\000\000\n\001\001\001", '\000' }, nh_ifp = 0xffffa00027843800, nh_ifa = 0xffffa0000eec4900, nh_aifp = 0xffffa00027843800, nh_pksent = 0xffff0000c2d38430, nh_prepend_len = 0 '\000', spare = "\000\000", spare1 = 0, nh_prepend = '\000' , nh_priv = 0xffffa00027813900} (kgdb) p *nhg_priv->nhg_nh_weights[0].nh->nh_priv $7 = {nh_family = 2 '\002', spare = 0 '\000', nh_type = 2, rt_flags = 526336, nh_idx = 0, cb_func = 0x0, nh_refcnt = 4294967295, nh_linked = 1, nh = 0xffffa00027813200, nh_control = 0xffffa00000ddf900, nh_next = 0xffffa00027813900, nh_vnet = 0xffffa0000084c580, nh_epoch_ctx = {data = {0xffff0000005a2f90 , 0x0}}} (kgdb) p *nhg_priv->nhg_nh_weights[1].nh->nh_priv $8 = {nh_family = 2 '\002', spare = 0 '\000', nh_type = 2, rt_flags = 2050, nh_idx = 11, cb_func = 0x0, nh_refcnt = 4, nh_linked = 2, nh = 0xffffa00027813800, nh_control = 0xffffa00000ddf900, nh_next = 0xffffa00027813500, nh_vnet = 0xffffa0000084c580, nh_epoch_ctx = {data = {0x0, 0x0}}} nh_refcnt = 4294967295 (0xffffffff) in nhg_priv->nhg_nh_weights[0].nh->nh_priv. >> On 22 Jun 2021, at 11:31, Andriy Gapon wrote: >> >> >> It seems that the panic message was >> panic: Misaligned access from kernel space! >> >> On 22/06/2021 12:54, Andriy Gapon wrote: >>> Not sure if I'll be able to get more out of this arm64 machine. >>> I was playing with ppp and switching routes between LAN and ppp when the crash happened. >>> The system is 2-3 weeks old 14.0-CURRENT as of c8250c5ada85fec. >>> Tracing pid 0 tid 100014 td 0xffffa00000c00000 >>> db_trace_self() at db_trace_self >>> db_stack_trace() at db_stack_trace+0x11c >>> db_command() at db_command+0x244 >>> db_command_loop() at db_command_loop+0x54 >>> db_trap() at db_trap+0xf8 >>> kdb_trap() at kdb_trap+0x1c4 >>> handle_el1h_sync() at handle_el1h_sync+0x74 >>> --- exception, esr 0xf2000000 >>> kdb_enter() at kdb_enter+0x44 >>> vpanic() at vpanic+0x1c4 >>> panic() at panic+0x44 >>> align_abort() at align_abort+0xb8 >>> handle_el1h_sync() at handle_el1h_sync+0x74 >>> --- exception, esr 0x96000021 >>> nhop_free() at nhop_free+0x100 >>> destroy_nhgrp() at destroy_nhgrp+0x38 >>> epoch_call_task() at epoch_call_task+0x158 >>> gtaskqueue_run_locked() at gtaskqueue_run_locked+0x178 >>> gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x9c >>> fork_exit() at fork_exit+0x74 >>> fork_trampoline() at fork_trampoline+0x14 >> >> >> -- >> Andriy Gapon >> > -- Andriy Gapon