Re: network crash in nhop_free

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Sat, 10 Jul 2021 09:07:30 UTC
On 09/07/2021 00:02, Alexander V. Chernikov wrote:
> Hi Andriy,
> 
> Could you by any chance provide a bit more info on the system networking configuration and the steps leading to panic?
> No chance for a coredump?
> 
> destroy_nhgrp() suggests that there was a multipath route (default?) that was deleted.
> nhops are created with UMA_ALIGN_PTR, so I suspect there is a garbage inside nhgrp pointer..

I've just reproduced the problem and got a crash dump.
The new panic is a little bit different, but I think that it confirms your analysis.
Also, you are right about the multipath route, although its creation was not 
intentional.

The test setup is a host with an ethernet interface and a 3g modem (for ppp).
The default default route is via the ethernet.

Destination        Gateway            Flags     Netif Expire
default            192.168.0.1        UGS        dwc0
8.8.8.8            192.168.0.1        UGHS       dwc0
127.0.0.1          link#2             UH          lo0
192.168.0.0/24     link#1             U          dwc0
192.168.0.137      link#1             UHS         lo0

192.168.0.0/24 is the LAN.
The static route to 8.8.8.8 is for internet accessibility checking.

Interesting bits of my ppp configuration:
----- ppp.linkup -----
3g:
  add! default HISADDR
----------------------

When I bring up the ppp link I get two default routes -- which is not what I 
expected even when using 'add!':
Destination        Gateway            Flags     Netif Expire
default            192.168.0.1        UGS        dwc0
default            10.1.1.1           UGS        tun0
8.8.8.8            192.168.0.1        UGHS       dwc0
10.1.1.1           link#4             UHS        tun0
10.133.147.118     link#4             UHS         lo0
127.0.0.1          link#2             UH          lo0
192.168.0.0/24     link#1             U          dwc0
192.168.0.137      link#1             UHS         lo0

The procedure to re-create the problem is two bring up and down the ppp link 
twice.  That is, up -> down -> up -> down -> crash.

Now, about the new crash.
The panic message is:
panic: refcount 0xffffa00027813318 wraparound

The stack trace is approximately the same:
panic() at panic+0x44
_refcount_update_saturated() at _refcount_update_saturated+0x14
nhop_free() at nhop_free+0x118
destroy_nhgrp() at destroy_nhgrp+0x38
epoch_call_task() at epoch_call_task+0x158
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x178
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x9c
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14

 From kgdb it seems like a refcount underflow (decrement from zero).
(kgdb) p *nhg_priv
$1 = {nhg_idx = 0, nhg_nh_count = 2 '\002', nhg_spare = "\000\000", nhg_refcount 
= 0, nhg_linked = 1, nh_control = 0x0, nhg_priv_next = 0x0, nhg = 
0xffffa00032049e80, nhg_epoch_ctx = {data = {
       0xffff0000005a0edc <destroy_nhgrp_epoch>, 0xffffa0000eecb148}}, 
nhg_nh_weights = 0xffffa00032049ed0}
(kgdb) p nhg_priv->nhg_nh_weights[0]
$2 = {nh = 0xffffa00027813200, weight = 0}
(kgdb) p nhg_priv->nhg_nh_weights[1]
$3 = {nh = 0xffffa00027813800, weight = 1}
(kgdb) p *nhg_priv->nhg_nh_weights[0].nh
$4 = {nh_flags = 128, nh_mtu = 1500, {gw4_sa = {sin_len = 16 '\020', sin_family 
= 2 '\002', sin_port = 0, sin_addr = {s_addr = 16843018}, sin_zero = 
"\000\000\000\000\000\000\000"}, gw6_sa = {sin6_len = 16 '\020',
       sin6_family = 2 '\002', sin6_port = 0, sin6_flowinfo = 16843018, 
sin6_addr = {__u6_addr = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = 
{0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0},
     gw_sa = {sa_len = 16 '\020', sa_family = 2 '\002', sa_data = 
"\000\000\n\001\001\001\000\000\000\000\000\000\000"}, gwl_sa = {sdl_len = 16 
'\020', sdl_family = 2 '\002', sdl_index = 0, sdl_type = 10 '\n',
       sdl_nlen = 1 '\001', sdl_alen = 1 '\001', sdl_slen = 1 '\001', sdl_data = 
"\000\000\000\000\000\000\000"}, gw_buf = "\020\002\000\000\n\001\001\001", 
'\000' <repeats 19 times>}, nh_ifp = 0xffffa00027843800,
   nh_ifa = 0xffffa0000eec4900, nh_aifp = 0xffffa00027843800, nh_pksent = 
0xffff0000c2d38cd8, nh_prepend_len = 0 '\000', spare = "\000\000", spare1 = 0, 
nh_prepend = '\000' <repeats 47 times>, nh_priv = 0xffffa00027813300}
(kgdb) p *nhg_priv->nhg_nh_weights[1].nh
$5 = {nh_flags = 640, nh_mtu = 1500, {gw4_sa = {sin_len = 16 '\020', sin_family 
= 2 '\002', sin_port = 0, sin_addr = {s_addr = 16843018}, sin_zero = 
"\000\000\000\000\000\000\000"}, gw6_sa = {sin6_len = 16 '\020',
       sin6_family = 2 '\002', sin6_port = 0, sin6_flowinfo = 16843018, 
sin6_addr = {__u6_addr = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = 
{0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0},
     gw_sa = {sa_len = 16 '\020', sa_family = 2 '\002', sa_data = 
"\000\000\n\001\001\001\000\000\000\000\000\000\000"}, gwl_sa = {sdl_len = 16 
'\020', sdl_family = 2 '\002', sdl_index = 0, sdl_type = 10 '\n',
       sdl_nlen = 1 '\001', sdl_alen = 1 '\001', sdl_slen = 1 '\001', sdl_data = 
"\000\000\000\000\000\000\000"}, gw_buf = "\020\002\000\000\n\001\001\001", 
'\000' <repeats 19 times>}, nh_ifp = 0xffffa00027843800,
   nh_ifa = 0xffffa0000eec4900, nh_aifp = 0xffffa00027843800, nh_pksent = 
0xffff0000c2d38430, nh_prepend_len = 0 '\000', spare = "\000\000", spare1 = 0, 
nh_prepend = '\000' <repeats 47 times>, nh_priv = 0xffffa00027813900}

(kgdb) p *nhg_priv->nhg_nh_weights[0].nh->nh_priv
$7 = {nh_family = 2 '\002', spare = 0 '\000', nh_type = 2, rt_flags = 526336, 
nh_idx = 0, cb_func = 0x0, nh_refcnt = 4294967295, nh_linked = 1, nh = 
0xffffa00027813200, nh_control = 0xffffa00000ddf900,
   nh_next = 0xffffa00027813900, nh_vnet = 0xffffa0000084c580, nh_epoch_ctx = 
{data = {0xffff0000005a2f90 <destroy_nhop_epoch>, 0x0}}}
(kgdb) p *nhg_priv->nhg_nh_weights[1].nh->nh_priv
$8 = {nh_family = 2 '\002', spare = 0 '\000', nh_type = 2, rt_flags = 2050, 
nh_idx = 11, cb_func = 0x0, nh_refcnt = 4, nh_linked = 2, nh = 
0xffffa00027813800, nh_control = 0xffffa00000ddf900, nh_next = 0xffffa00027813500,
   nh_vnet = 0xffffa0000084c580, nh_epoch_ctx = {data = {0x0, 0x0}}}

nh_refcnt = 4294967295 (0xffffffff) in nhg_priv->nhg_nh_weights[0].nh->nh_priv.

>> On 22 Jun 2021, at 11:31, Andriy Gapon <avg@FreeBSD.org> wrote:
>>
>>
>> It seems that the panic message was
>> panic: Misaligned access from kernel space!
>>
>> On 22/06/2021 12:54, Andriy Gapon wrote:
>>> Not sure if I'll be able to get more out of this arm64 machine.
>>> I was playing with ppp and switching routes between LAN and ppp when the crash happened.
>>> The system is 2-3 weeks old 14.0-CURRENT as of c8250c5ada85fec.
>>> Tracing pid 0 tid 100014 td 0xffffa00000c00000
>>> db_trace_self() at db_trace_self
>>> db_stack_trace() at db_stack_trace+0x11c
>>> db_command() at db_command+0x244
>>> db_command_loop() at db_command_loop+0x54
>>> db_trap() at db_trap+0xf8
>>> kdb_trap() at kdb_trap+0x1c4
>>> handle_el1h_sync() at handle_el1h_sync+0x74
>>> --- exception, esr 0xf2000000
>>> kdb_enter() at kdb_enter+0x44
>>> vpanic() at vpanic+0x1c4
>>> panic() at panic+0x44
>>> align_abort() at align_abort+0xb8
>>> handle_el1h_sync() at handle_el1h_sync+0x74
>>> --- exception, esr 0x96000021
>>> nhop_free() at nhop_free+0x100
>>> destroy_nhgrp() at destroy_nhgrp+0x38
>>> epoch_call_task() at epoch_call_task+0x158
>>> gtaskqueue_run_locked() at gtaskqueue_run_locked+0x178
>>> gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x9c
>>> fork_exit() at fork_exit+0x74
>>> fork_trampoline() at fork_trampoline+0x14
>>
>>
>> -- 
>> Andriy Gapon
>>
> 


-- 
Andriy Gapon