Re: iflib_timer() vs ixl_admin_timer() race

From: Kevin Bowling <kevin.bowling_at_kev009.com>
Date: Thu, 13 Jan 2022 00:47:44 UTC
On Wed, Jan 12, 2022 at 10:18 AM Alexander Motin <mav@freebsd.org> wrote:

> Thanks, Krzystof,
>
> Grepping now for iflib_admin_intr_deferred() through the sources I see
> the same issue in other Intel NIC drivers, plus bnxt(4).  So the main
> controversy I see is this: either admin intr should not stop and restart
> the callouts (and then question is why it does that now), or if it
> should be so heavy-weight, it should not be called so regularly (and
> then question why so many drivers do it).
>
> In few drivers I've found it even more interesting: iflib_timer() calls
> IFDI_TIMER(), which calls ixgbe_if_timer(), which calls
> iflib_admin_intr_deferred(), which in its turn restart the
> iflib_timer().  Ouroboros. ;)
>

From memory, I believe that call may be related to NICs without dedicated
admin intrs.  Please keep them in mind when you clean this up.


> On 12.01.2022 11:46, Galazka, Krzysztof wrote:
> > Hi Alexander,
> >
> > Thank you for pointing this out. ixl_admin_timer is used by a callout so
> > I don’t think we can acquire any locks there. IIRC it was added to let
> > tools for NVM update interact with FW while interface is down, so maybe
> > stopping it when interface goes UP would be an option. Let me think this
> > through.
> >
> > Thanks,
> >
> > Krzysiek
> >
> > *From:* Eric Joyner <erj@freebsd.org>
> > *Sent:* Wednesday, January 12, 2022 8:22 AM
> > *To:* Alexander Motin <mav@freebsd.org>
> > *Cc:* Joyner, Eric <eric.joyner@intel.com>; Galazka, Krzysztof
> > <krzysztof.galazka@intel.com>
> > *Subject:* Re: iflib_timer() vs ixl_admin_timer() race
> >
> > Well, I think the situation is more-or-less as you say -- it's just that
> > the intent of ixl_admin_timer() is supposed to have the adapter's admin
> > queue processed constantly, regardless of interrupt status or down/up
> > status. I think as a shorthand we just called _task_fn_admin because it
> > handles a bunch of other things as well as getting down to calling
> > ixl_if_update_admin_status() which does the admin queue processing. But
> > as you found, I don't think it was originally written to be called
> > periodically on a regular basis like iflib_timer(), so the callouts are
> > interacting badly.
> >
> > My first thought would be to replace the call to
> > iflib_admin_intr_deferred() in ixl_admin_timer() with
> > ixl_if_update_admin_status() while taking the CTX_LOCK(). I'm not sure
> > everything else in _task_fn_admin() needs to run periodically like that
> > in the driver. That would avoid the callouts getting stopped every 500
> > ticks.
> >
> > I CC'd my coworker Krzysztof who currently owns the driver; he might
> > have better thoughts on this.
> >
> > - Eric
> >
> > On Tue, Jan 11, 2022 at 10:46 AM Alexander Motin <mav@freebsd.org
> > <mailto:mav@freebsd.org>> wrote:
> >
> >     Yes.  I've reverted my patch for now to not break anything, but all
> >     this
> >     situation does not look right for me too, so I'd appreciate your
> look.
> >
> >     On 11.01.2022 12:21, Eric Joyner wrote:
> >      > Hi,
> >      >
> >      > I came back from vacation yesterday, but I'll look into this for
> you
> >      > today. It's not a good situation when changing the period of the
> >     iflib
> >      > timer breaks something in the driver...
> >      >
> >      > - Eric
> >      >
> >      > On Sun, Jan 9, 2022 at 8:19 PM Alexander Motin <mav@freebsd.org
> >     <mailto:mav@freebsd.org>
> >      > <mailto:mav@freebsd.org <mailto:mav@freebsd.org>>> wrote:
> >      >
> >      >     Hi Eric,
> >      >
> >      >     In 90bc1cf65778 I've done what looked like a trivial
> >     optimization.  But
> >      >     just after committing it I've noticed weird race it created
> >     between
> >      >     iflib_timer() and ixl_admin_timer() timers.  I see ixl(4)
> calls
> >      >     iflib_admin_intr_deferred() every 0.5s, which calls
> >     _task_fn_admin(),
> >      >     which every time stops and restart txq->ift_timer callout (AKA
> >      >     iflib_timer()), which actually has exactly the same period of
> >     0.5s.  It
> >      >     seems before my change iflib_timer() managed to run once for
> >     all TX
> >      >     queues before being stopped, but after the change due to
> >     additional
> >      >     jitter many of callouts are getting stopped before firing.
> >      >
> >      >     Could you please sched some light for me on what is going on
> >     there?
> >      >     That race between two callouts looks like potential bug to
> >     me, working
> >      >     by some coinncedence, especially considering iflib_timer()
> >     period is
> >      >     tunable.
> >      >
> >      >     Thanks in advance.
> >      >
> >      >     --
> >      >     Alexander Motin
> >      >
> >
> >     --
> >     Alexander Motin
> >
> > ------------------------------------------------------------------------
> > *Intel Technology Poland sp. z o.o.
> > * ul. Słowackiego 173 | 80-298 Gdańsk | Sąd Rejonowy Gdańsk Północ | VII
> > Wydział Gospodarczy Krajowego Rejestru Sądowego - KRS 101882 | NIP
> > 957-07-52-316 | Kapitał zakładowy 200.000 PLN.
> >
> > Ta wiadomość wraz z załącznikami jest przeznaczona dla określonego
> > adresata i może zawierać informacje poufne. W razie przypadkowego
> > otrzymania tej wiadomości, prosimy o powiadomienie nadawcy oraz trwałe
> > jej usunięcie; jakiekolwiek przeglądanie lub rozpowszechnianie jest
> > zabronione.
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). If you are not the intended
> > recipient, please contact the sender and delete all copies; any review
> > or distribution by others is strictly prohibited.
> >
>
> --
> Alexander Motin
>
>