Suggestion of higher level libxo API (in addition to current one)

Sat Dec 13 11:59:29 UTC 2014

Am 10.12.2014 um 20:35 schrieb Phil Shafer:
> Konstantin Belousov writes:
>> If it is required that the name be identical, it makes more sense
>> to require some handle to be passed to functions instead of the string.
>> I.e. either there should be some atom-like calls which convert string
>> to identifier, or first call should return handle passed to consequent
>> calls.
> 
> The library supports "do the right thing" mode, where it records
> the open tag value and doesn't need the user to provide it:
> 
> http://juniper.github.io/libxo/libxo-manual.html#dtrt-mode
> 
> I could enable this by default and use it when NULL is passed for
> an instance name.

Hi Phil,

I've thought a lot about this over the last days. Based on my
experience with the conversion of pciconf, which was not written
with XML or JSON output in mind and which outputs quite complex
data structures, I have started to wonder whether a different
concept might be easier to use for the programmer. I think it
is possible to provide a different API that can co-exist with
the current functions.

First what I think the current libxo API provides:

The API is very low-level in the sense, that every call generates
(at most) one XML or JSON element, and it generates it the moment
the call is made.

xo_open_list:
	writes JSON label and starts array with "[{",
	no output for XML

xo_open_instance:
	ignored for JSON,
	starts sub-tree within given element for XML

xo_open_container:
	writes label and opening "{",
	starts sub-tree within given element for XML

xo_emit:
	writes label and value for XML,
	creates element with given value

I'm not describing the xo_close_* functions, but they just emit the
syntactically required closing tags or brackets/braces.

Use of this API leads to the following code to print error information
(which is suppressed in the output, if there are no errors to report,
which is the normal case).

---------------------------------------------------------------------
static int errors;

static void
print_header(const char *header)
{
        if (errors++ == 0)
                xo_open_list("error-category");
        xo_open_instance("error-category");
        xo_emit("{:category/%14s} = ", header);
}

static void
print_bits(const char *header, struct bit_table *table, uint32_t mask)
{
        int first;

        first = 1;
        for (; table->desc != NULL; table++)
                if (mask & table->mask) {
                        if (first) {
                                print_header(header);
                                xo_open_list("detected-error");
                                first = 0;
                        } else
                                xo_emit("                 ");
                        xo_open_instance("detected-error");
                        xo_emit("{:description/%s}\n", table->desc);
                        xo_close_instance("error");
                        mask &= ~table->mask;
                }
        if (!first)
                xo_close_list("detected-error");
        if (mask != 0) {
                if (first) {
                        print_header(header);
                        first = 0;
                } else
                        xo_emit("                 ");
                xo_emit("Unknown: {:unknown-errors-bitmask/0x%08x}\n",
			mask);
        }
        if (!first)
                xo_close_instance("error-category");
        if (errors != 0)
                xo_close_list("error-category");
}
---------------------------------------------------------------------

Maybe I'm doing things wrong, but I had to use two state variables to
control the correct printing of opening and closing tags ("first" was
in the non-libxo version, "errors" was added just for libxo).

I have to be careful to explicitly record state and to close any of
the containers/lists that I open.

This could have been much easier to write, if the libxo API was more
abstract ("higher level") and printing of XML elements was decoupled
from xo_XXX calls.

The semantics of xo_open_{list,instance,container} is, that a new level
in the tree structure is created, and everything following that call is
in the new level, until the corresponding xo_close_*() is called.

I'd rather want to describe the structure in a different way, for which
I'm using an example syntax:

xo_start_instance(instance_name):

	Creates a new instance *at the same level* as the previous one,
	the previous one is automatically closed (if any is open). If
	this is the first list/array item, an xo_open_list with the same
	name as that of the instance is implied.

	This call is thus equivalent to (pseudo-code using Python style
	indenting):

		if IN_SOME_CONTAINER()
			xo_close_container_d()
		if NOT_IN_LIST(instance_name)
			if IN_SOME_LIST()
				xo_close_list_d()
			xo_open_list(instance_name)
		if INSTANCE_IS_OPEN(instance_name)
			xo_close_instance(instance_name)
		xo_open_instance(instance_name)

xo_start_container(container_name):

	Starts a new container at the same level as the previous one,
	closing any currently open container or list.

		if IN_SOME_LIST()
			if IN_SOME_INSTANCE()
				xo_close_instance_d()
			xo_close_list_d()
		if IN_SOME_CONTAINER()
			xo_close_container_d()
		xo_open_container()

xo_start_level(debug_name):

	Starts a new sub-tree (which is implicitly done by xo_open_*).
	No output is generated, except that a previously open container
	or instance is implicitly closed.
	The previous state is kept on a stack and will become visible
	after the corresponding xo_end_level() is called. The debug_name
	is optional and just there to be checked to match the one passed
	to a corresponding xo_finish_level(debug_name).

		if IN_SOME_LIST() or IN_SOME_CONTAINER()
			close_*
		xo_push_state()

xo_finish_level():

	Close all open instances and containers and finish the sub-tree,
	restore the state before the previous xo_start_level().

		if IN_SOME_LIST() or IN_SOME_CONTAINER()
			close_*
		xo_pop_state()

The following call sequences should generate identical output, with
indentation reflecting "pretty" XML:

	xo_open_container("group1");
		xo_open_container("subgroup1");
			xo_emit("{:label1/value}");
			xo_emit("{:label2/value}");
		xo_close_container("subgroup1");
		xo_open_list("array1")
			xo_open_instance("array1");
				xo_emit("{:array1-label1/value}");
				xo_emit("{:array1-label2/value}");
			xo_close_instance("array1");
			xo_open_instance("array1");
				xo_emit("{:array1-label1/value}");
				xo_emit("{:array1-label2/value}");
			xo_close_instance("array1");
		xo_close_list("array1");
		xo_emit("{:label3/value}");
	xo_close_container("group1");

<group1>
	<subgroup1>
		<label1>value</label1>
		<label2>value</label2>
	</subgroup1>
	<array1>
		<array1-label1>value</<array1-label1>
		<array1-label2>value</<array1-label2>
	<array1>
	<array1>
		<array1-label1>value</<array1-label1>
		<array1-label2>value</<array1-label2>
	<array1>
	<label3>value</label3>
</group1>

With the xo_start_* functions I'd use the following for the same
output:

	xo_start_container("group1");
		xo_start_group("");	// (*1*)	
		xo_start_container("subgroup1");
			xo_emit("{:label1/value}");
			xo_emit("{:label2/value}");
		xo_start_instance("array1");
				xo_emit("{:array1-label1/value}");
				xo_emit("{:array1-label2/value}");
		xo_start_instance("array1");
				xo_emit("{:array1-label1/value}");
				xo_emit("{:array1-label2/value}");
		xo_finish_group("");
		xo_emit("{:label3/value}");

(*1*) If no new "group" was started, then the next container would be
at the same level in the tree, i.e. <group1> would be closed before
<subgroup1> is opened.

All the xo_close can be implied, and I do not have to remember, if an
opening tag has been written. The closing tags ore braces are always
generated as required.

This syntax can express everything that is formally correct with the
xo_open/xo_close calls. E.g. it automatically places xo_open_list()
around instances - and the names are required to match for the well
formed cases anyway.

The only "complication" is, that I'd need to put xo_start_group()
and xo_finish_group() calls around the output for a sub-tree, else
the generated data structure will be "flat".

There is also no risk of run-away data structures, due to the omitting
of closing tags. This can easily be taken care of by start/finish_group.

If there is a subroutine that emits XML/JSON data into a sub-tree, then
just wrap the commands in this sub-routine in xo_start_group(func_name)
and xo_stop_group(func_name) - the arguments are not used to generate
any output, just to generate debug output if XOF_WARN is set.

The calls I suggest are more similar to the JSON output generated than
to the XML output (which is similar structured to xo_open/close_*).

I am convinced, that the libxo API that I propose simplifies use of
libxo. It is higher level in the sense, that it is more decoupled from
the generated output. In fact, it allows for semantic extensions, that
could further simplify use of libxo:

1) If there is no call to xo_emit() within a container or instance,
   the tags for the container/instance could be omitted. That would
   allow to always call xo_start_container, before testing for any
   data to actually put into that container. In the example code from
   pciconf that I quoted above, that could remove the need for the
   helper function, if xo_optional_emit() was introduced as a variant
   of xo_emit(), that is only written if any regular xo_emit() is also
   printed into the same container.

2) I have seen cases, where list instances are created out of order
   by the non-libxo code, and to adapt that code to libxo, the loop
   has to be completely rewritten (leading to different output than
   before for the XO_TEXT case).
   This could be fixed with xo_start_instance(), if all the instances
   were collected between xo_start_group() to xo_finish_group() and
   all the JSON array structures were only written with all collected
   instances when the xo_close_container() implied by xo_finish_group()
   is called. This would allow the following code:

	xo_start_instance("array1");
		xo_emit(array1-label/value1);
	xo_start_instance("array2");
		xo_emit(array2-label/value2);
	xo_start_instance("array1");
		xo_emit(array1-label/value3);

   leading to the following JSON output:

	array1 [
		{ array1-label: value1 }
		{ array1-label: value3 }
	],
	array2 [
		{ array2-label: value2 }
	]

   This is of course also possible with xo_open/xo_close, but as soon
   you start to dissociate these calls from the output (by delaying the
   output), you could as well use the xo_start_* variants I suggest ...

Anyway, except for the badly chosen name "xo_optional_emit()" (which as
described above is for data, that should be omitted if there is no
xo_emit() within the same container/instance, I think this is a workable
API for libxo, which just spares the programmer from working as if he
was directly writing XML output (with the addition of xo_open_list(),
which in fact is ignored for XML and just required for JSON).

The two APIs could be mixed with each other, if the semantics of
xo_open_* and xo_start_* were aligned. E.g. any xo_open_list()
after a xo_start_container/instance would just be interpreted as

	xo_start_group();
	xo_start_list();
	xo_finish_group();

and similar for containers (instances are only allowed inside lists
and are thus covered by the wrapping of xo_start_list() shown above).

It is obvious that it is not possible to specify xo_open*() via xo_start
and xo_start via xo_open without generating cyclic dependencies, but the
real implementation could be either way (with emulation of xo_open via
xo_start being the simpler case, since it only requires the wrapping
with xo_start_group).

Here is the example code from above with the xo_start API:

---------------------------------------------------------------------
static void
print_bits(const char *header, struct bit_table *table, uint32_t mask)
{
        int first;

	xo_start_group("print_bits1");	// just a debug label
	xo_start_instance("error-category");

        first = 1;
        for (; table->desc != NULL; table++)
                if (mask & table->mask) {
                        if (first) {
			        xo_emit("{:category/%14s} = ", header);
				xo_start_group("print_bits2");
                                first = 0;
                        } else
                                xo_emit("                 ");
                        xo_start_instance("detected-error");
                        xo_emit("{:description/%s}\n", table->desc);
                        mask &= ~table->mask;
                }
        if (mask != 0) {
                if (first) {
		        xo_emit("{:category/%14s} = ", header);
			xo_start_group("print_bits2");
                        first = 0;
                } else
                        xo_emit("                 ");
                xo_emit("Unknown: {:unknown-errors-bitmask/0x%08x}\n",
			mask);
        }
        if (!first)
		xo_finish_group("print_bits2");
	xo_finish_group("print_bits1");
}
---------------------------------------------------------------------

(While doing the conversion, I noticed that there might be a bug in the
call sequence of my code quoted in the first example with xo_open API,
which I had not spotted before.)

The above example is not only shorter, the diff against the non-libxo
version is also significantly smaller and easier to understand and
review.

I'm open to further discussion and I understand, that it is important to
not void effort put in code that uses xo_open*(). But I'd really love
to be able to use a higher-level API and I think what I suggested fits
the bill ;-)

Best regards, STefan