panic on invalid ifp pointer in iflib drivers

Wed Oct 16 22:07:19 UTC 2019

On Wed, 2019-10-16 at 21:16 +0000, Keller, Jacob E wrote:
> Hi,
>  
> I’m investigating an issue on the iflib ixl driver in 11.3-RELEASE as
> well as 12-RELEASE. We found a panic in that occurs if SCTP/IPv6
> traffic is being transmitted while the device is detached:
>  

I've just been told it has reproduced this on the latest 12-stable as
well.

> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0xfffffe0000411e38
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff80c84700
> stack pointer           = 0x28:0xfffffe2f4351b600
> frame pointer           = 0x28:0xfffffe2f4351b650
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 12 (swi4: clock (0))
> trap number             = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe2f4351b2c0
> vpanic() at vpanic+0x17e/frame 0xfffffe2f4351b320
> panic() at panic+0x43/frame 0xfffffe2f4351b380
> trap_fatal() at trap_fatal+0x369/frame 0xfffffe2f4351b3d0
> trap_pfault() at trap_pfault+0x62/frame 0xfffffe2f4351b420
> trap() at trap+0x2b3/frame 0xfffffe2f4351b530
> calltrap() at calltrap+0x8/frame 0xfffffe2f4351b530
> --- trap 0xc, rip = 0xffffffff80c84700, rsp = 0xfffffe2f4351b600, rbp
> = 0xfffffe2f4351b650 ---
> in6_selecthlim() at in6_selecthlim+0x20/frame 0xfffffe2f4351b650
> sctp_lowlevel_chunk_output() at
> sctp_lowlevel_chunk_output+0xeb2/frame 0xfffffe2f4351b790
> sctp_chunk_output() at sctp_chunk_output+0x68c/frame
> 0xfffffe2f4351c110
> sctp_timeout_handler() at sctp_timeout_handler+0x2d8/frame
> 0xfffffe2f4351c180
> softclock_call_cc() at softclock_call_cc+0x15b/frame
> 0xfffffe2f4351c230
> softclock() at softclock+0x7c/frame 0xfffffe2f4351c260
> intr_event_execute_handlers() at
> intr_event_execute_handlers+0x9a/frame 0xfffffe2f4351c2a0
> ithread_loop() at ithread_loop+0xb7/frame 0xfffffe2f4351c2f0
> fork_exit() at fork_exit+0x84/frame 0xfffffe2f4351c330
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe2f4351c330
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
>  
>  
> From what I’ve gathered so far, it appears that the issue is a use-
> after-free where the SCTP stack gets an ifp pointer that’s no longer
> valid. We’ve reproduced this issue on multiple iflib-based drivers,
> including ixl and the recently published ice driver code (available
> on phabricator).
> 
> Additionally, we cannot reproduce it on legacy-stack drivers for ixl,
> or a mellanox 100G board we have. This leads me to believe that it’s
> an issue in iflib rather than in the specific device drivers.
>  
> I am not sure exactly what’s going wrong here... anyone have
> suggestions? I thought it might be an issue of when ether_ifdetach is
> called. That function is supposed to clear all of the pre-existing
> routes from the route entry list. I’m thinking maybe somehow a route
> gets added after ether_ifdetach is called.
>  
> In the iflib_device_deregister function, ether_ifdetach is called
> just after iflib_stop, (which would call a device’s if_stop routine),
> and then the task queues are shutdown, a driver’s ifdi_detach handler
> is called, and the ifp is free’d at the end. In the ixl legacy
> driver, ether_ifdetach is called prior to the stop routine. However,
> in the mlx5 driver, it’s called after a call to close_locked()...
>  
> So I’m really not sure exactly what could cause a stale ifp pointer
> to get into the route entry list.
>  
> Thanks,
> Jake