libctf & ddb

Tue Aug 12 15:07:57 UTC 2014

Hi all,

I am _very_ sorry that I forgot to write these weekly reports for few
past weeks and I would like to make this right, so here is the work
that I have done:

I introduced a public API that is created by C macros so that the
generic get_something() function is written only once and not N times.
All API functions are returning an integer that is one of the CTF_E_*
constants or the CTF_OK. If a function is intended to provide some
value, it is always transferred by the last argument. All types like
ctf_typedef, ctf_enum_entry, ctf_float, ..., are typedefs of
appropriate structs and are intended to be used as opaque types. This
provides us implementation freedom to the future (we may decide that
names are not as char*, but rather some index to a string table or
such). All programs that are using this libctf (ctfdump, ctfstats, and
ctfquery (see below)) are now using this API.

The library is pollution free, as all non-static functions that exist
are with the _ctf prefix to ensure the zero-to-none possibility of a
name collision.
Currently almost every data structure and its member, algorithm and
function has its documentation that is written manually.

The ctfdump program that serves very similar purpose as the CDDL
licensed one. I had a nice idea, that maybe the project
"Machine-readable utilities" could provide (with some help, of course)
its multi-format support. I have seen some repositories [1] that tried
to do this, so there may be an audience for such feature. Also showing
the ability to cooperate is very nice indeed.

The ctfstats program that computes and emits CTF data statistics
(similar to old ctfdump -S).

The ctfquery program that serves as an intermediate implementation
before putting this code into the DDB. The program takes an input, a
type name, and looks it up in the type database. After successful
match, it presents the data type in appropriate manner - for typedef
it solves the possible typedef chain to the basic type, for struct it
prints all its members (and if a member is another struct, it prints
it with additional indentation). The only thing missing is the memory
access that would make this completely DDB-like. But almost all of
this code is usable in the DDB.

The type lookup based on the name is using very naive/simple approach
- O(n) traversal of all types while reporting success on the first
hit. I have done no real benchmarking, since the search appears
instantaneous (I even tried simulating some high workload and the
cycle run just as swiftly). In case that this is not good enough,
there are several possible improvements algorithm-wise - building a
trie that would speed the algorithm to O(longest type name) and then
making buckets of types - struct bucket, union bucket, enum bucket -
so when the user requests e.g. "struct dpt_cbb", we can safely make
the enquiry in the struct bucket only. This can be, obviously, applied
to all kinds of data structures - simple linked lists or tries. The
search works in situations, when the user omits the struct/union/enum
part too.

One thing that needs separate attention is the ctfquery feature to
guess the data structure type - linked list, binary tree, n-ary tree,
all the queue(3) and tree(3) data types. The struct in question is
tested for presence of all these members (for example, being an
queue(3) SLIST means that the struct contains an anonymous struct that
has only one member, pointer to the parent-struct and the member has
to be named "sle_next".  This was maybe the most enjoyable coding from
this project so far. The usefulness of this feature is, that after we
discover that a struct is a linked list, we are able to print it more
intelligently (see my proposal for this). Visualisation of other data
structures will not be done in this project, but I am open for future
suggestions (but, visualising a red-black tree on a 80x25 terminal
might be ... well, challenging).

Type to string conversion: if the CTF data in question is for example
pointer -> const -> struct dpt_ccb, it gets resolved to "const
dpt_ccb*".  This is used in the ctfdump and ctfquery programs. There
are still some crazy scenarios that need to be taken care of but the
majority of the types is converted correctly.

The libctf undergone some linting and valgrinding, which discovered
some nasty hidden memleaks and potential bugs that are now fixed.

One of the bugs took me 4 days to fix - improper handling of large
struct members - the mistake was hidden under three layers of logic
and I must admit that I was pretty happy after I finally found and
fixed it.

DDB code to parse arguments of the command (this was a bit tricky
thanks to lack of documentation and weird naming).

Right now I am fighting a huge problem: while writing the proposal
during the last winter, I was able to use linker_ctf_get function to
obtain the CTF data of the kernel file in the kernel space.
Unfortunately, the same code on the same installation (and on a clean
10.0 and 9.2 installations) crashed very badly and the problem seems
to be the vn_open call in the linker_elf_ctf_get in the
/usr/sys/kern/kern_ctf.c file. I tried to call the vn_open function
directly in my modified DDB code and it crashes too - the exact call
looks like this (it is taken directly from the kern_ctf.c file): [2]
I am looking forward to any ideas about this problem :)

My plans for the next few days:
I need to adapt the libctf allocation routines to work in kernel space
too, therefore I need to #ifdef the usage of all malloc(3)s with
malloc(9)s and some minor changes in strdup, strcpy and such. This
should not pose any problem. There is no need for the zlib to be used
in the kernel-space-version of the library, as the linker_ctf_get()
function returns already unzipped CTF data. Small changes need to be
done in the libctf loading code, because right now, we are able to
only get file name and read all the ELF sections by ourselves, but the
linker_ctf_get() function already does this step, so we can omit this
too. To summarize, next baby-step is to be able to print all CTF types
inside the DDB and then just copy/paste the ctfquery code and add some
usability/user experience functionality like DDB modifiers for
hexadecimal output and such.

Maybe I forgot some things or details, so if I think of some more
additions later, I will write them here. Again, I am sorry for the
delay, please do not get an impression of some lazy attitude or that I
have not been working on the project.

Best,
Daniel

[1] https://github.com/rmustacc/ctf2json
[2] http://pastebin.com/gxG55vHn