Re: bhyve nvlist constness bug

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Thu, 27 Oct 2022 23:44:05 UTC
On 10/23/22 11:18 AM, Mark Johnston wrote:
> I'm going through compiler warnings in the bhyve code with the aim of
> bumping WARNS, since the current setting hides bugs.  There's one in the
> config code that looks a bit tricky to resolve and I was hoping for some
> guidance on the right way to do that.
> 
> The basic problem is _lookup_config_node() may return an nvlist via
> nvlist_get_nvlist(), but nvlist_get_nvlist() returns a const nvlist_t
> *, so _lookup_config_node() is discarding the const qualifier.  And
> indeed, some callers will modify the returned node.  The use of
> nvlist_move_nvlist() in _lookup_config_node() has a similar problem:
> nvlist_move_* "consumes" the passed value and is not supposed to be
> mutated after.
> 
> I'm pretty sure this is actually a non-issue with our nvlist
> implementation; when adding an nvlist value to an nvlist, it doesn't
> store anything except for the pointer itself, so you can mutate it
> safely.  Note, this is not true for all value types, specifically
> arrays.  However, there are multiple nvlist implementations out there,
> and ours might change such that bhyve's config code becomes incorrect.
> 
> The bug seems annoying to fix because consumers can do this:
> 
>      nvlist_t *nvl = find_config_node(path);
>      set_config_value_node(nvl, "foo", "bar");
> 
> So, if we naively changed find_config_node() to return a copy of the
> nvlist at "path", set_config_value_node() would have to somehow figure
> out where in the config tree to insert the updated nvlist, but it
> doesn't have the path anymore.
> 
> To fix it properly could change find_config_node() to return some opaque
> type which contains the source path of the node so that we can DTRT when
> mutating the node.  IMO it's better if the config node type is opaque to
> consumers.  Or, make each nvlist node store its source path by storing
> it in the nvlist itself.  e.g., the nvlist node describing the device
> model at "pci.0.1.2" would have a "__path" key with value "pci.0.1.2",
> so that set_config_value_node() can figure out where in the tree to
> replace an existing copy of "pci.0.1.2".  Or is there a simpler way to
> fix this that I missed?
> 
> Any thoughts?  I'm happy to implement the suggestions above but didn't
> want to do that work without some buy-in.

The config stuff uses nvlist on the premise that it is a useful way to
store the data.  However, it is true that nvlists seem to not be as
useful for storing mutable data.  Rather, they are useful as a way to
pass structured data across domain boundaries (e.g. serialized over
a socket, or between kernel and userland).  If nvlist doesn't work well
I'd rather just change the data structures we use in bhyve rather than
add hidden nodes with path names, etc.  To that end, if nvlist is too
limiting then we can do what my first intution was which was to turn
bhyve into a C++ program and use something like std::map<> to represent
the tree instead.

-- 
John Baldwin