svn commit: r341803 - head/libexec/rc

Tue Dec 11 19:58:09 UTC 2018

On Tue, Dec 11, 2018 at 10:04 AM Warner Losh <imp at bsdimp.com> wrote:
> On Tue, Dec 11, 2018, 9:55 AM John Baldwin <jhb at freebsd.org wrote:
>>
>> The 'read' builtin in sh can't use buffering, so it is always going to be slow
>
> It can't use it because of pipes. The example from the parts of this that was on IRC was basically:
>
> foo | (read bar; baz)
>
> Which reads one line into the bar variable and then sends the rest to the bar command.

It can't trivially, but it's not impossible.  sh could play games and
buffer its own use of stdin, and then open a fresh pipe for stdin of
subsequent non-builtins, writing out unused portions of the buffer.[A]

Some other alternatives that would require kernel support but are
things we've talked about doing in the kernel before anyway:

* If we had something like eBPF programs attached to IO, maybe sh's
read built-in could push a small eBPF program into the kernel that
determined how many bytes could be read from the pipe in a single
syscall without reading too far.  It's fairly trivial.  Simply
returning a number of bytes up to and including the first '\n' would
be a fine, if sometimes conservative amount.  (Input lines can be
continued with a trailing backslash, except in -r mode, but as a
first-cut approximation, reading-until-newline is probably good
enough.)[B]

* Heck, even just a read_until_newline(2) syscall would work and
probably be more broadly useful than just sh(1).  I don't think it
passes the sniff test — not general enough, and probably not something
you want beginners stumbling across instead of fgets(3) — but it'd be
fine, and there are other pipe-abusing programs that care about
reading ASCII text lines without overconsuming input than just
sh(1).[C]

* If we had something like Linux's tee(2) system call (which is as it
sounds — tee(1) for pipes), sh(1)'s read built-in could tee(2) for
buffering without impacting stdin, and read(2) stdin only when it knew
how many bytes were consumed (or when the pipe buffer became full).[D]

I suspect (C) would be the easiest to implement correctly, followed by
(D).  (B) is requires some architectural design and bikeshedding and
the details on the kernel side are tricky.  (A) would be a little
tricky and probably require extensive changes to sh(1) itself, which
is a risk to the base system.  But it would not impact the kernel.

Is there any interest in a tee(2)-like syscall?

Thanks,
Conrad