CBOR (Was: My experiences with Rust)
- In reply to: Isaac (.ike) Levy: "Re: My experiences with Rust"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 25 Aug 2025 18:32:10 UTC
On Sat, 23 Aug 2025 09:03:45 -0400
"Isaac (.ike) Levy" <ike@blackskyresearch.net> wrote:
> > On Aug 23, 2025, at 7:13 AM, Anthony Pankov <anthony.pankov@yahoo.com>
> > wrote: On 23 августа 2025 г., 1:32:25 you wrote:
> > ..
> >> For output-only data from kernel TO BE LOGGED, text-only format would be
> >> strictly wanted to read/process using oldies-but-goodies tools like
> >> less, grep, awk, sed and any other thing to handle texts.
> >
> > I have a conceptual question. How to ensure reliability of recognizing
> > events coded in log string? There is no information of all
> > mutations which log string can take. 'grep "\d+ dropped"' will
> > do the job well for "200 dropped" till log string has a form "dropped
> > 100" or "150\s\sdropped". In latter cases event will be missed.
>
> Great question. I'm not savvy to that '\d+' (egrep thing?), but the problem
> you presented hits right at the core of this "text v binary output"
> conversation.
>
> Certainly, utilities may return data whose structure is not ideal for every
> case, (even as "structure"), but depending on what needs to accomplish,
> there are 50 years of UNIX utilites which can solve a diverse number of
> problems users may face.
(material) Progress is when the same thing could be made with less efforts
(resources), or (the same) with same efforts (resources) you can do more
things than before.
Let's see how this applies to 50 years of UNIX utilites:
> Expanding the case you provided, lets say "myprogram" returns the following
> "structured" output lines:
>
> $ myprogram
> foo dropped 1/2 foo
> bar sortof double! dropped bar
> baz dropped 2/2 baz
> bang sortof last bang
> $
>
> One could simply get every "dropped" word event via:
>
> $ myprogram | grep dropped
> foo dropped 1/2 foo
> bar sortof double! dropped bar
> baz dropped 2/2 baz
> $
...first of all, this output in incomprehensible. I do not see any structure
in it (need a detailed manual).
> But, if one wanted to discard the extra 'sortof' line, one could use any
> number of tools to match just the 2nd field,
>
> $ myprogram | awk '$2 == "dropped" { print $0 }'
> foo dropped 1/2 foo
> baz dropped 2/2 baz
> $
>
> And when dealing with large output, "text editors" of odd kinds become
> useful, (wc(1) is a "text editor", right?)
>
> $ myprogram | awk '$2 == "dropped" { print $0 }' | wc -l
> 2
> $ myprogram | grep dropped | wc -l
> 3
> $
>
> Or, perhaps we wanted a quick progress meter to interactively monitor some
> situation in the output, $ while [ 1 ] ; do myprogram | awk '$2 == "dropped"
> { print $0 }' | wc -l ; date ; sleep 5 ; done 2
> Sat Aug 23 08:46:34 EDT 2025
> 2
> Sat Aug 23 08:46:39 EDT 2025
> 2
> Sat Aug 23 08:46:44 EDT 2025
> ^C%
> $
>
> We just quickly made the "text editor" into a sort of tail(1) like
> interactive output.
>
> These examples are fairly simplistic, but, seeing as I'm just some
> knucklehead user, sharing stuff I do all the time, I hope it demonstrates
> the conceptual foundation for what a "text editor" can be.
Now this exactly demonstrates how outdated the classic "raw text stream"
approach is - for very simple example, for one particular program, the user
has to write code. And it's *every* user! Too many efforts wasted - just
multiply number of users to number of programs.
In contrast, with JSON, when yourt field is named "dropped", you just filter
on it's value and output what you want. Example:
$ echo '[{"name":"JSON", "good":true, "value":9},
{"name":"XML", "good":false, "value": 3}]' \
| jq '.[] | select(.value > 4) | .name'
"JSON"
And they don't have to be in any particular order, compared to $4 in awk.
> > I feel that there is, in principle, not enough information do solve
> > the problem. But I don't think that structured data is the only cure.
>
> Indeed there is no ideal for the structure of the data output. (My example
> above has the "dropped" problem...)
>
> Yet, forcing binary decoders or specific tooling into the workstream
> introduce a signifigant barrier for users to fetch the output they want,
> (even just to try to understand it).
Not at all. With CBOR, just pipe to diagnostic notation tool and you get text,
which is in simplest case is JSON, so processable by any tool well-suited to
do it.
> "Write programs to handle text streams, because that is a universal
> interface." Full stop.
Again, nowadays universal interfaces are structured.
> Jsut output text, and users can figure out what they want to do with it.
> There's no way to predict what they'll want or need to do, or what their
> capabilities or toolsets are. That's UNIX long term success right there.
In you examples, users not "can" but MUST figure out and write code to handle
every particular formatting case. That's not a progress since pipe invention.
--
WBR, @nuclight