panic on invalid ifp pointer in iflib drivers

Keller, Jacob E jacob.e.keller at intel.com
Thu Oct 17 19:31:02 UTC 2019


> -----Original Message-----
> From: John Baldwin <jhb at FreeBSD.org>
> Sent: Thursday, October 17, 2019 9:31 AM
> To: Keller, Jacob E <jacob.e.keller at intel.com>; freebsd-net at freebsd.org
> Cc: shurd at llnw.com; Joyner, Eric <eric.joyner at intel.com>
> Subject: Re: panic on invalid ifp pointer in iflib drivers
> 
> On 10/16/19 2:16 PM, Keller, Jacob E wrote:
> > Hi,
> >
> > I'm investigating an issue on the iflib ixl driver in 11.3-RELEASE as well as 12-
> RELEASE. We found a panic in that occurs if SCTP/IPv6 traffic is being transmitted
> while the device is detached:
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 0; apic id = 00
> > fault virtual address   = 0xfffffe0000411e38
> > fault code              = supervisor read data, page not present
> > instruction pointer     = 0x20:0xffffffff80c84700
> > stack pointer           = 0x28:0xfffffe2f4351b600
> > frame pointer           = 0x28:0xfffffe2f4351b650
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                         = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 0
> > current process         = 12 (swi4: clock (0))
> > trap number             = 12
> > panic: page fault
> > cpuid = 0
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe2f4351b2c0
> > vpanic() at vpanic+0x17e/frame 0xfffffe2f4351b320
> > panic() at panic+0x43/frame 0xfffffe2f4351b380
> > trap_fatal() at trap_fatal+0x369/frame 0xfffffe2f4351b3d0
> > trap_pfault() at trap_pfault+0x62/frame 0xfffffe2f4351b420
> > trap() at trap+0x2b3/frame 0xfffffe2f4351b530
> > calltrap() at calltrap+0x8/frame 0xfffffe2f4351b530
> > --- trap 0xc, rip = 0xffffffff80c84700, rsp = 0xfffffe2f4351b600, rbp =
> 0xfffffe2f4351b650 ---
> > in6_selecthlim() at in6_selecthlim+0x20/frame 0xfffffe2f4351b650
> > sctp_lowlevel_chunk_output() at sctp_lowlevel_chunk_output+0xeb2/frame
> 0xfffffe2f4351b790
> > sctp_chunk_output() at sctp_chunk_output+0x68c/frame 0xfffffe2f4351c110
> > sctp_timeout_handler() at sctp_timeout_handler+0x2d8/frame
> 0xfffffe2f4351c180
> > softclock_call_cc() at softclock_call_cc+0x15b/frame 0xfffffe2f4351c230
> > softclock() at softclock+0x7c/frame 0xfffffe2f4351c260
> > intr_event_execute_handlers() at intr_event_execute_handlers+0x9a/frame
> 0xfffffe2f4351c2a0
> > ithread_loop() at ithread_loop+0xb7/frame 0xfffffe2f4351c2f0
> > fork_exit() at fork_exit+0x84/frame 0xfffffe2f4351c330
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe2f4351c330
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> >
> >
> > From what I've gathered so far, it appears that the issue is a use-after-free
> where the SCTP stack gets an ifp pointer that's no longer valid. We've reproduced
> this issue on multiple iflib-based drivers, including ixl and the recently published
> ice driver code (available on phabricator).
> >
> > Additionally, we cannot reproduce it on legacy-stack drivers for ixl, or a
> mellanox 100G board we have. This leads me to believe that it's an issue in iflib
> rather than in the specific device drivers.
> >
> > I am not sure exactly what's going wrong here... anyone have suggestions? I
> thought it might be an issue of when ether_ifdetach is called. That function is
> supposed to clear all of the pre-existing routes from the route entry list. I'm
> thinking maybe somehow a route gets added after ether_ifdetach is called.
> >
> > In the iflib_device_deregister function, ether_ifdetach is called just after
> iflib_stop, (which would call a device's if_stop routine), and then the task queues
> are shutdown, a driver's ifdi_detach handler is called, and the ifp is free'd at the
> end. In the ixl legacy driver, ether_ifdetach is called prior to the stop routine.
> However, in the mlx5 driver, it's called after a call to close_locked()...
> >
> > So I'm really not sure exactly what could cause a stale ifp pointer to get into the
> route entry list.
> 
> Nominally, ifnet drivers should call ether_ifdetach first to remove public
> references to the ifnet and only call their stop routine after that has returned.
> This ensures any open if_ioctl invocations have completed, etc. before the
> stop routine is invoked.  Otherwise you are open to a race where the inteface
> can be upped via an ioctl after you have stopped the hardware.
> 

So we should (a) move ether_ifdetach before the stop, and...

> Any other references to the ifnet via eventhandlers, etc. should also be
> deregistered before calling the stop routine.
> 

(b) make sure all of the event handlers are moved before the stop too.

> After the hardware is stopped, interrupt handlers should be torn down and
> callouts
> and tasks drained to ensure there are no other references to the ifp outside of
> the thread running detach.
> 
> After that you can release device resources, destroy mutexes, free the ifp, etc.
> Note that drivers have to be prepared for ether_ifdetach to invoke if_ioctl (e.g.
> when detaching bpf), but of the drivers I've looked at this has generally been a
> non-issue.
> 
> It sounds like iflib should be doing the detach before calling iflib_stop.
> 

Right. I think also the VLAN event handlers need to be moved too.

Let me give that a try out to see if it fixes things.

Thanks,
Jake

> --
> John Baldwin


More information about the freebsd-net mailing list