Re: git: 91f44749c6fe - main - ifnet: make if_index global

From: Gleb Smirnoff <>
Date: Tue, 01 Feb 2022 21:00:16 UTC

On Tue, Feb 01, 2022 at 12:46:57AM +0100, Marko Zec wrote:
M> > What exactly is FRR?
M> > If we see there is an expectation from some
M> > software that index shall start at 1, I will rethink this change. So
M> > far there is no such evidence.
M> The implicit API contract for the past 40 years or so was that lo0
M> is always present exactly at ifindex 1.  Your change broke that contract
M> for all vnets except for vnet0.  Saying that there's no evidence
M> software outside our src tree takes that API contract for granted, while
M> claiming you have no idea what FRR is, doesn't sound reasurring.

As you said, the API contract was implicit. If we see evidence that
something is broken by ifindex of lo0 not being 1 we have something
to talk about.

M> > M> > The holes in interface indexes were always allowed.  
M> > M> 
M> > M> True, the holes were formally allowed but were never huge, and now
M> > M> they will be guaranteed with #vnets > 1.  Again, that's not
M> > M> something 3rd party tools would expect.  
M> > 
M> > Well "huge" or "small" doesn't make difference for algorithms. Holes
M> > are allowed.
M> > 
M> > M> What could be the repercussions of ifindex globalization to 16-bit
M> > M> size of ifru_index in struct ifreq?
M> > M> What can prevent a vnet to inflate the global ifindex space,
M> > M> causing the linear scan in sysctl_ifcount() to become a global
M> > M> problem?  
M> > 
M> > We don't think that virtualization of if_index was there to save
M> > if_index space, was it?  I think the reason was just cause it was
M> > easier.
M> That's a non-answer to the question I asked, so let me restate it: any
M> vnet can now DoS vnet0 (and / or any / all other vnets) via if_index
M> exhaustion, so what exactly did we gain in exchange for having that new
M> attack vector wide open?

Well, I don't buy any argument of vnets being serious resource containment
and security feature that makes the host truly immutable to any malicious
action inside vnet.

The example of that being not true is very close: before this change
or after I back it out, any root in vnet can use dummynet, queue a
packet, remove interface and panic the kernel.

I don't buy this argument in particular, cause before my change, or after
I back it out, a malicious VNET can still allocate ifnets until kernel
enters endless loop in "for (idx = 1; idx <= V_if_index; idx++)", or if
the u_short overflow is fixed there, then kernel would run out of memory,
since ifindex_alloc() never fails. Oh, yeah, it can't overflow vnet0 if_index,
though! So I don't buy that I just made "a new attack vector wide open".

M> And no, if_index virtualization was not done because it was "easier"
M> (than what?), it was done in order to make all vnets equal, and to
M> ensure no state leaks through if_index space.  Your commit violated
M> both of those fundamental VNET concepts.
M> > M> Why do we now need to check wheter curvnet != NULL in
M> > M> ifnet_byindex(), i.e. are there callers to ifnet_byindex() where
M> > M> curvnet is not set?  If so, that smells like a bug in the caller?  
M> > 
M> > Good point. Maybe an assertion should be put there.
M> Now I really don't get it, if you don't expect curvnet to be NULL in
M> ifnet_byindex(), why can't you resolve ifindex in per-VNET ifnet table?
M> And why do you check for curvnet != NULL then?

I agree that curvnet != NULL should be converted to assertion that
curvnet is set here.

M> > M> Or, is it the other way around, that ifnet_byindex() is now
M> > M> intended to become a silent workaround for callers which lack
M> > M> curvnet context?  If that is so, it is indeed becoming a major
M> > M> VNET KPI feature (though I'd call it a bug), and as such, it
M> > M> should have been clearly stated in both the commit message, and as
M> > M> a comment in the ifnet_byindex() code.
M> > M> 
M> > M> Finally, the commit message does not reveal any clues what this
M> > M> change actually tries to accomplish.  The claim "Now that ifindex
M> > M> is static to if.c we can unvirtualize it." tell us nothing about
M> > M> the true intent.  
M> > 
M> > Well, the intent is revealed in the following commits, that use index
M> > as the only true stable & unique identifier of an interface except its
M> > pointer.
M> Nowhere is it explained _why_ exactly you need this extra true stable &
M> unique identifier besides ifnet ptr?  If there's a mbuf lingering
M> somewhere which points to an ifnet long after that ifnet is gone (such
M> as when a USB dongle suddenly gets unpluged), that's a bug in the queue
M> / code which holds such a mbuf / reference, and that has to be resolved
M> right there where it happens.  How is ifindex globalization going to
M> help us with disappearing interfaces (USB, vlan, ng_eiface etc.) besides
M> making if_vmove() of epairs less prone to panics, which seems to be in
M> your current focus?

There are multiple places where mbufs with rcvif pointer are stored beyond
epoch. I was making a generic KPI to safely restore those pointers.

M> > I have had a previous version, that was maintaining two indexes a
M> > virtual and non-virtual:
M> > 
M> >
M> > 
M> > If you ask me of a perfect version of this design, I would say same
M> > thing I already said in other topics: if_vmove shall not exist.
M> If if_vmove() did not exist, how would we move real ifnets (such as
M> em / bge / igb), or vlan cloners attached to those, from vnet0 to
M> another vnet, and vice versa?

Instead of if_vmove() there should have been a KPI that would allow fully
detach and free an interface from the driver and then allocate and attach
a different instance again in a different VNET.

Gleb Smirnoff