Interrupts, Interrupted, Part II

From: Elliott Mitchell <ehem+freebsd_at_m5p.com>
Date: Mon, 28 Nov 2022 20:27:35 UTC
As part of trying to get a project into FreeBSD yet another issue showed
up.  There has been a push to get this functioning on top of INTRNG.
This should be doable, but a problem arose.  Due to this being a very
dynamic PIC (hotplug PICs may be difficult in hardware, they're easy in
software) it seems best to register interrupts during allocation and then
fully release them during release.

Problem is the call sequence:
intr_isrc_deregister() -> (for non-IPI interrupts)
        isrc_release_counters() ->
                panic("%s: not implemented", __func__)

The inverse of isrc_release_counters() is isrc_setup_counters().  These
two are for setup/teardown of the "intrnames" and "intrcnt" arrays.
Initially I was thinking "intrnames" and "intrcnt" were likely to be
deeply intertwined into the FreeBSD kernel, they've been around since the
mid-1990s.  Yet upon examination, only two places use these two global
variables.  First, they are used by watchdog_fire() for reporting
interrupts in case it triggers.  Second, the functions sysctl_intrnames()
and sysctl_intrcnt() use them for their sysctls (which are then used by
`vmstat -i`).

That is a *really* short list of uses.  With the default parameters the
"intrnames" array will tend to be roughly ~2MB and "intrcnt" would be
roughly 128KB.  That isn't huge, but an average system is only going to
use perhaps 15% of that.

The only benefit I see of having "intrnames" and "intrcnt" is they make
those 2 sections of source code simpler.  Yet by consuming noticeable
amounts of memory, they make other sections slower. If they saw heavy
usage this might be worthwhile, but I doubt they see enough for that to
justify their existence.

In light of this I've created D36610 for their removal.  Portions of my
initial implementation are likely to be rejected, but I like the idea.
Notably this has a net removal of almost 100 lines of source code per
architecture.  THAT seems the characteristic of something whose time has
passed.


One spot likely to be rejected is the watchdog_fire() replacement
implementation.  I had been thinking the architecture interrupt tables
were a good source of replacement data, but I'm now doubtful of this.  In
particular the functionality proposed in D36609 doesn't seem quite right.
I now suspect it may be better to add a function to sys/kern/kern_intr.c
for reporting this, but I'm unsure what to name it.

The other spot is the "hw.intrnames" and "hw.intrcnt" sysctl()s really
need work.  Much of removed duplicated source code from the architectures
appears to be directed towards maintaining compatibility with the
existing sysctl()s.  The difficult reimplementation also suggests these
need a careful reimplementation.  Their behavior hints at their long
history, but their behavior is also rather poor for real use.

I suggest there really should instead be a "hw.interrupts" sysctl() which
provides the interrupt name, counter and stray count as triplets.  This
would ensure consistency between the two, avoiding the potential for an
interrupt being added or removed between reading "hw.intrnames" and
"hw.intrcnt".


The format of the "hw.intrnames" speaks to its history.  At some point
pre-2003 it\0was\0a\0NUL\0separated\0list\0of\0strings.  When interrupt
names became changeable packing the strings became a problem.  As such
they were padded with spaces which allowed the old approach of finding
the next interrupt name using `strlen()` to continue working, even though
the entries were actually fixed-length.

Problem is, since 2003 `vmstat -i` has never been updated to work with
the current situation.  As such, trying to figure out the whys behind
everything has been difficult.  Then there was at least one lurking bug
just to make things more interesting.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445