Re: bhyve nvlist constness bug

From: Mark Johnston <markj_at_freebsd.org>
Date: Sun, 06 Nov 2022 20:10:34 UTC
On Fri, Nov 04, 2022 at 11:22:55AM +0100, Corvin Khne wrote:
> > On 10/23/22 11:18 AM, Mark Johnston wrote:
> >> I'm going through compiler warnings in the bhyve code with the aim of
> >> bumping WARNS, since the current setting hides bugs.  There's one in the
> >> config code that looks a bit tricky to resolve and I was hoping for some
> >> guidance on the right way to do that.
> >>
> >> The basic problem is _lookup_config_node() may return an nvlist via
> >> nvlist_get_nvlist(), but nvlist_get_nvlist() returns a const nvlist_t
> >> *, so _lookup_config_node() is discarding the const qualifier.  And
> >> indeed, some callers will modify the returned node.  The use of
> >> nvlist_move_nvlist() in _lookup_config_node() has a similar problem:
> >> nvlist_move_* "consumes" the passed value and is not supposed to be
> >> mutated after.
> >>
> >> I'm pretty sure this is actually a non-issue with our nvlist
> >> implementation; when adding an nvlist value to an nvlist, it doesn't
> >> store anything except for the pointer itself, so you can mutate it
> >> safely.  Note, this is not true for all value types, specifically
> >> arrays.  However, there are multiple nvlist implementations out there,
> >> and ours might change such that bhyve's config code becomes incorrect.
> >>
> >> The bug seems annoying to fix because consumers can do this:
> >>
> >>       nvlist_t *nvl = find_config_node(path);
> >>       set_config_value_node(nvl, "foo", "bar");
> >>
> >> So, if we naively changed find_config_node() to return a copy of the
> >> nvlist at "path", set_config_value_node() would have to somehow figure
> >> out where in the config tree to insert the updated nvlist, but it
> >> doesn't have the path anymore.
> >>
> >> To fix it properly could change find_config_node() to return some opaque
> >> type which contains the source path of the node so that we can DTRT when
> >> mutating the node.  IMO it's better if the config node type is opaque to
> >> consumers.  Or, make each nvlist node store its source path by storing
> >> it in the nvlist itself.  e.g., the nvlist node describing the device
> >> model at "pci.0.1.2" would have a "__path" key with value "pci.0.1.2",
> >> so that set_config_value_node() can figure out where in the tree to
> >> replace an existing copy of "pci.0.1.2".  Or is there a simpler way to
> >> fix this that I missed?
> >>
> >> Any thoughts?  I'm happy to implement the suggestions above but didn't
> >> want to do that work without some buy-in.
> > The config stuff uses nvlist on the premise that it is a useful way to
> > store the data.  However, it is true that nvlists seem to not be as
> > useful for storing mutable data.  Rather, they are useful as a way to
> > pass structured data across domain boundaries (e.g. serialized over
> > a socket, or between kernel and userland).  If nvlist doesn't work well
> > I'd rather just change the data structures we use in bhyve rather than
> > add hidden nodes with path names, etc.  To that end, if nvlist is too
> > limiting then we can do what my first intution was which was to turn
> > bhyve into a C++ program and use something like std::map<> to represent
> > the tree instead.
> >
> > --
> > John Baldwin
> 
> 
> I do agree with John. It looks like nvlist isn't made for mutable data. 
> So, when using nvlist we have to make sure that we don't modify the data 
> which is already stored in it or we have to use nvlist_take, modify the 
> data and write it back with nvlist_move/nvlist_add. This requires that 
> we patch the consumers to follow these rules.

The consumers don't need to be patched with my "__path" proposal above
since they use config.c accessors to get and set config values.  Though,
I admit I didn't implement it though so might be missing something.

> I don't know how much work would be required to turn bhyve into a C++ 
> program. If it's doable, I vote for this approach.

Ok, I'll try that.  The first step is probably to make config nodes an
opaque type; today all consumers have to pass nvlist_t *s around even
though they use accessors in config.c for everything.

Since my immediate goal was to raise WARNS for bhyve, and I'm quite
close to that, I posted a patch which simply disarms the warning for the
time being: https://reviews.freebsd.org/D37293