Re: bhyve nvlist constness bug

From: Corvin_Köhne <corvink_at_FreeBSD.org>
Date: Fri, 04 Nov 2022 10:22:55 UTC
> On 10/23/22 11:18 AM, Mark Johnston wrote:
>> I'm going through compiler warnings in the bhyve code with the aim of
>> bumping WARNS, since the current setting hides bugs.  There's one in the
>> config code that looks a bit tricky to resolve and I was hoping for some
>> guidance on the right way to do that.
>>
>> The basic problem is _lookup_config_node() may return an nvlist via
>> nvlist_get_nvlist(), but nvlist_get_nvlist() returns a const nvlist_t
>> *, so _lookup_config_node() is discarding the const qualifier.  And
>> indeed, some callers will modify the returned node.  The use of
>> nvlist_move_nvlist() in _lookup_config_node() has a similar problem:
>> nvlist_move_* "consumes" the passed value and is not supposed to be
>> mutated after.
>>
>> I'm pretty sure this is actually a non-issue with our nvlist
>> implementation; when adding an nvlist value to an nvlist, it doesn't
>> store anything except for the pointer itself, so you can mutate it
>> safely.  Note, this is not true for all value types, specifically
>> arrays.  However, there are multiple nvlist implementations out there,
>> and ours might change such that bhyve's config code becomes incorrect.
>>
>> The bug seems annoying to fix because consumers can do this:
>>
>>       nvlist_t *nvl = find_config_node(path);
>>       set_config_value_node(nvl, "foo", "bar");
>>
>> So, if we naively changed find_config_node() to return a copy of the
>> nvlist at "path", set_config_value_node() would have to somehow figure
>> out where in the config tree to insert the updated nvlist, but it
>> doesn't have the path anymore.
>>
>> To fix it properly could change find_config_node() to return some opaque
>> type which contains the source path of the node so that we can DTRT when
>> mutating the node.  IMO it's better if the config node type is opaque to
>> consumers.  Or, make each nvlist node store its source path by storing
>> it in the nvlist itself.  e.g., the nvlist node describing the device
>> model at "pci.0.1.2" would have a "__path" key with value "pci.0.1.2",
>> so that set_config_value_node() can figure out where in the tree to
>> replace an existing copy of "pci.0.1.2".  Or is there a simpler way to
>> fix this that I missed?
>>
>> Any thoughts?  I'm happy to implement the suggestions above but didn't
>> want to do that work without some buy-in.
> The config stuff uses nvlist on the premise that it is a useful way to
> store the data.  However, it is true that nvlists seem to not be as
> useful for storing mutable data.  Rather, they are useful as a way to
> pass structured data across domain boundaries (e.g. serialized over
> a socket, or between kernel and userland).  If nvlist doesn't work well
> I'd rather just change the data structures we use in bhyve rather than
> add hidden nodes with path names, etc.  To that end, if nvlist is too
> limiting then we can do what my first intution was which was to turn
> bhyve into a C++ program and use something like std::map<> to represent
> the tree instead.
>
> --
> John Baldwin


I do agree with John. It looks like nvlist isn't made for mutable data. 
So, when using nvlist we have to make sure that we don't modify the data 
which is already stored in it or we have to use nvlist_take, modify the 
data and write it back with nvlist_move/nvlist_add. This requires that 
we patch the consumers to follow these rules.

I don't know how much work would be required to turn bhyve into a C++ 
program. If it's doable, I vote for this approach.


-- 
Best regards,
Corvin