General purpose library for name/value pairs.

Pawel Jakub Dawidek pjd at FreeBSD.org
Fri Jul 5 19:52:41 UTC 2013


On Fri, Jul 05, 2013 at 07:07:53AM +0000, Poul-Henning Kamp wrote:
> In message <20130704215329.GG1402 at garage.freebsd.pl>, Pawel Jakub Dawidek write
> s:
> 
> >Currently we have plenty of random methods to pass data around:
> >- nmount(2) implements its own name/value pair approach,
> >- GEOM uses XML to pass data to userland,
> >- various sysctls and ioctls use binary structures to export data to
> >  userland.
> 
> You even overlooked a number of other marshalling implementations,
> NETGRAPH for instance.
> 
> As should be obvious from the above list, where I'm guilty of most
> of it, I think this is a fundamentally a good and overdue idea.
> 
> I am not entirely convinced that one-size-fits-all is possible, but
> a name/value list is basically what I ended up with both in nmount()
> and GEOM::gctl.
> 
> Exporting complex state with XML is very hard to beat when it comes
> to code complexity in the kernel.
> 
> For one thing, you'd need nested n/v lists which is where I decided
> in GEOM that XML was the way to go.

I do support nesting, but with a limit to avoid too deep recursion
during nvlist_pack/unpack.

I haven't said that, but my plan is not to replace existing mechanisms,
ie. XML in GEOM or even nmount(2). This would be hard as we would need
to retain backward compatibility anyway. My main goal is to have
something for use with new code. If someone will be willing to replace
existing formats, that's fine be me, but not my main goal.

> Also, exporting performance counters via mmap(2) (see: devstat) is
> far superior to anything else, because it means no-syscall high-speed
> monitoring is possible.

Yes, libnv is not for use in very-performance-sensitive cases.

> ifconfig(8) and netstat(1) could really use a total rewrite along
> those lines.
> 
> So, yes I think your n/v idea has a lot of merits, but it's not
> going to be The Final Solution, and you should not force it on
> problems where we have better solutions.

I'm not planning to:)

> Implementation issues:
> 
> Are duplicate names allowed, and if so, do they keep stable order ?

Duplicates are allowed and the ordering is stable unless one of those
flags is specified:

	/* Duplicated names are not allowed. */
	#define	NV_FLAG_UNIQUE_NAME		0x01
	/* Duplicated names of the same type are not allowed. */
	#define	NV_FLAG_UNIQUE_NAME_TYPE	0x02

To be honest I'd much prefer not to support duplicates, because arrays
of values are supported as well as nesting. Not supporting duplicates
would simplify implementation a bit.

> Otherwise we will need some other API-convenient form of array-simulation,
> for instance for variable number of slices in GEOM slicers etc.

The API does support arrays. All nvlist_add_<type>() functions:

	void nvlist_add_nvpair(nvlist_t *nvl, nvpair_t *nvp);
	void nvlist_add_null(nvlist_t *nvl, const char *name);
	void nvlist_add_bool(nvlist_t *nvl, const char *name, bool value);
	void nvlist_add_int8(nvlist_t *nvl, const char *name, int8_t value);
	void nvlist_add_uint8(nvlist_t *nvl, const char *name, uint8_t value);
	void nvlist_add_int16(nvlist_t *nvl, const char *name, int16_t value);
	void nvlist_add_uint16(nvlist_t *nvl, const char *name, uint16_t value);
	void nvlist_add_int32(nvlist_t *nvl, const char *name, int32_t value);
	void nvlist_add_uint32(nvlist_t *nvl, const char *name, uint32_t value);
	void nvlist_add_int64(nvlist_t *nvl, const char *name, int64_t value);
	void nvlist_add_uint64(nvlist_t *nvl, const char *name, uint64_t value);
	void nvlist_add_string(nvlist_t *nvl, const char *name, const char *value);
	void nvlist_add_stringf(nvlist_t *nvl, const char *name, const char *valuefmt, ...) __printflike(3, 4);
	void nvlist_add_stringv(nvlist_t *nvl, const char *name, const char *valuefmt, va_list valueap) __printflike(3, 0);
	void nvlist_add_nvlist(nvlist_t *nvl, const char *name, const nvlist_t *value);
	void nvlist_add_descriptor(nvlist_t *nvl, const char *name, int value);
	void nvlist_add_bool_array(nvlist_t *nvl, const char *name, const bool *value, size_t nitems);
	void nvlist_add_int8_array(nvlist_t *nvl, const char *name, const int8_t *value, size_t nitems);
	void nvlist_add_uint8_array(nvlist_t *nvl, const char *name, const uint8_t *value, size_t nitems);
	void nvlist_add_int16_array(nvlist_t *nvl, const char *name, const int16_t *value, size_t nitems);
	void nvlist_add_uint16_array(nvlist_t *nvl, const char *name, const uint16_t *value, size_t nitems);
	void nvlist_add_int32_array(nvlist_t *nvl, const char *name, const int32_t *value, size_t nitems);
	void nvlist_add_uint32_array(nvlist_t *nvl, const char *name, const uint32_t *value, size_t nitems);
	void nvlist_add_int64_array(nvlist_t *nvl, const char *name, const int64_t *value, size_t nitems);
	void nvlist_add_uint64_array(nvlist_t *nvl, const char *name, const uint64_t *value, size_t nitems);
	void nvlist_add_string_array(nvlist_t *nvl, const char *name, const char * const *value, size_t nitems);
	void nvlist_add_nvlist_array(nvlist_t *nvl, const char *name, const nvlist_t * const *value, size_t nitems);
	void nvlist_add_descriptor_array(nvlist_t *nvl, const char *name, const int *value, size_t nitems);

Because it is easy to use format strings for names and because the library
support nesting, one could also do the following:

	nvlist_t *nvl, *slice;
	int i;

	nvl = nvlist_create(0);
	nvlist_add_uint32(nvl, "nslices", nslices);
	for (i = 0; i < nslices; i++) {
		slice = nvlist_create(0);
		nvlist_add_uint64(slice, "offset", slices[i]->offset);
		nvlist_add_uint64(slice, "size", slices[i]->size);
		nvlist_add_string(slice, "label", slices[i]->label);
		nvlist_addf_nvlist(nvl, slice, "slice%d", i);
	}

> >Note that we neither check if allocation succeeded nor individual
> >additions. The library is designed so that write-like operations can
> >gracefully handle previous failures. 
> 
> Yes, this is important, otherwise the error handling gets too maddening.

Right. sbuf(9) was inspiration for that:)

> Also remember a function for debugging which renders a nvlist_t (into
> a sbuf ?)

Currently I have:

	void nvlist_dump(const nvlist_t *nvl, int fd);
	void nvlist_fdump(const nvlist_t *nvl, FILE *out);

Not depending on sbuf could encourage wider addoption, so I'd prefer not
to do that. Converting it to JSON as simple string might be good idea.

> >So, to add elements to the list one can use either nvlist_add_<type>()
> >or nvlist_move_<type>() functions. To get the values one also have two
> >choices: nvlist_get_<type>() and nvlist_take_<type>() - the former just
> >returns the value (but the value is still part of the nvlist) and the
> >latter removes associated nvpair from nvlist and returns its value.
> 
> Do consider having these functions in a variant with a default
> argument, so that people don't have to wrap each optioanl n/v pair
> in an if().

Using default value to report a problem is too error-prone for my taste
and not intuitive. I was considering it (I even have earlier nv
implementation in src/sbin/hastd/nv.c that does that), but I decided
against it. For some types it is trivial, ie. if nvlist_get_string()
returns NULL we know there is a problem, but if nvlist_get_int32() returns
0 it is hard to say if it is a bug or not. In Solaris/IllumOS there is
additional set of functions that return an error and use additional
function argument for the result, for example we could have:

	int32_t nvlist_get_int32(const nvlist_t *nvl, const char *name);
	int nvlist_cget_int32(const nvlist_t *nvl, const char *name, int32_t *valuep);

I was considering this as well, but there is a lot of functions as it is
now and I'm not sure I want to add ~100 more (nvlist_cget_<type>() and
nvlist_ctake_<type>() for every type multiplied by format string version
multiplied by va_list version).

> One of the things you will need in the kernel is to check that sets
> of nv's contain what they should, before continuing processing.
> 
> One particular thing to detect is extra, unwanted, elements,
> which at least represent a programmer mistake and which may become
> a security risk down the road, if they suddenly gets used.
> 
> I would do this globally, by insisting that (kernel-)code cleans
> the nv lists it is passed, and then in syscall-return check that all
> nv-lists are in fact empty.
> 
> Leaving it to programmers is more risky, most people will not
> grasp the need to be that careful for the long term, but in
> that case you will need functions like:
> 
> 	int nvlist_accept_only(nvlist_t *, const char *, ...);
> 
> which complains if there are any "strange" nv pairs.
> (see above for arrays, it should support that.)
> 
> The complementary function, is also needed:
> 
> 	int nvlist_has_all_of(nvlist_t *, const char *, ...);

That's interesting idea. I do that in my current consumers of this API by
walking nvlist_t in a loop and checking for extra nvpairs. Having more
reusable mechanism for that would be nice. What you propose is not enough,
as we need three informations about nvpair (in most cases):
1. name
2. type
3. required/optional

So we could allow to create static const arrays with nvlist specification
that we could compare against, eg.

	struct nvpair_spec {
		const char	*name;
		int		 type;
		bool		 required;
	};
	#define	NVPAIR_SPEC_SENTINEL	{ NULL, NV_TYPE_NONE, false }

	static const struct nvpair_spec geom_provider_spec[] = {
		{ "name",	NV_TYPE_STRING,	true },
		{ "mediasize",	NV_TYPE_UINT64,	true },
		{ "sectorsize",	NV_TYPE_UINT64,	true },
		{ "stripesize",	NV_TYPE_UINT64,	false },
		NVPAIR_SPEC_SENTINEL
	};

	/* nvlist_check() could take buffer for detailed error message */
	if (nvlist_check(nvl, geom_provider_spec) < 0)
		err(1, "nvlist doesn't match specification");

> Finally, this kind of complex-data API causes errors which errno
> does not have the expressive power to communicate:
> 
> 	"foo" not allowed when "bar" also specified
> 
> Some years ago I proposed an API extension to allow the kernel/libs
> to return "detailed error strings" and I think this is a better
> idea than trying to shoe-horn errorstrings into individual API call.
> 
> Something like:
> 
> 	int error_buffer(char *, size_t len)
> 
> Registers a char[] where kernel/libs can dump error strings.
> 
> Thereafter, whenever a library or kernel call fail, you *MAY* have
> additional details in that bufffer:
> 
> 	static char myerrbuf[1024];
> 	[...]
> 	assert(0 == error_buffer(myerrbuf, sizeof myerrbuf));
> 	strcpy(myerrbuf, "(No details available)";
> 	[...]
> 	nvl = nvlist_recv(sock);
> 	if (nvl == NULL)
> 		err(1, "nvlist_recv() failed: %s", myerrbuf);
> 	[...]
> 
> This is just to show the principle, it should be thought through,
> for instance maybe strerror() should know about and report the
> buffer contents, to get instant benefit without code changes ?

Fine idea, but doesn't seem to directly related to libnv. I can imagine
implementing it in libnv internally, so that we not only keep internal
error number, but also internal error message that can be obtained with
nvlist_errmsg(). If nvl=NULL is passed it will just return "No memory" or
similar. It would be especially useful for the nvlist_check() above, but
also to tell which nvlist_add_<type>() exactly failed, etc. I like it.

There is only a problem with translating those messages, which we keep
avoiding.

> And now for a really nasty question:
> 
> Have you considered marshalling the n/v lists as JSON strings ?
> 
> It would be pretty dang efficient just to stuff it into an sbuf
> and it would make transport into the kernel a breeze.
> 
> Nobody says the kernel has to support anything but that exact
> JSON subset generated by your library functions, so parsing it
> it not too bad.

No, I haven't considered it, but currently with:

	void *nvlist_pack(const nvlist_t *nvl, size_t *sizep);
	nvlist_t *nvlist_unpack(const void *buf, size_t size);

you can easly pack it to a binary blob and pass to the kernel.

Thanks for your input.

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://mobter.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20130705/f5c2fe4d/attachment.sig>


More information about the freebsd-arch mailing list