Extending sbufs with a drain

Wed Sep 8 15:43:42 UTC 2010

On Tue, Sep 7, 2010 at 12:40 PM, Poul-Henning Kamp <phk at phk.freebsd.dk> wrote:
> In message <AANLkTinReHpgm8Bt8ya=PY-3Mz6w0FwcGq-psrjiesKv at mail.gmail.com>, mdf@
> FreeBSD.org writes:
>
> Not so fast...
>
> The first patch (0001...) I am basically OK with.  I question the
> choice of EOVERFLOW, which is generally for fixed size data structures.
> I think that should be changed to ENOMEM, pretty much throughout.
>
> I'm not sure I see much point in the second patch (0002...) but
> whatever.
>
> The third patch (0003...) Is not ok with me, for a number of reasons.
>
> First, I am not even sure we should add this to sbuf in the first
> place.
>
> Both userland and kernel printf's have half-assed support for random
> drain functions already.  On the other hand, those functions are
> PITA to use and mostly undocumented hacks, so we should get something
> better constructed.

This was an attempt to construct something better.  It may not be in a
final form, but these drains seem pretty useful to me.

What is the half-assed support you mention?  Is that the func callback
for kvprintf(9) and the general nature of a FILE object for printf(3)
?

> Sbufs were designed to provide overflow-safe string operations on
> buffers (static or dynamic) with latching error detection.
> They were not designed to become a generalized I/O stream facility.
>
> By adding drain functionality to sbufs, we get significant overlap
> with stdio's FILE, but not anywhere close to being able to replace
> those.

To me, since FILE is a C standard object, it is what it is.  These
sbuf drains can be used to implement FILE buffering if desired, and
apart from my difficulty understanding the user-space implementation
of printf(3), that was my intent.

Regardless of what sbufs were originally designed to do, this seems
like a useful enhancement to me that does not take away the ability to
use them for overflow detection.  An sbuf without a drain marked
SBUF_FIXED will still perform in this manner.  But now there are more
options.

> If we decide to add drain function support to sbufs, I have issues
> with the suggested patches:
>
> For one thing, sbuf is not a kernel facility, it is a general purpose
> string editing library which is also used outside FreeBSD.   If
> we add the drain facility it should work in both userland and kernel.

Given that sys/sbuf.h is included in 100+ kernel files and 4
user-space files, and the man page is in section 9, and the support
was originally kernel-only and only later was libsbuf added, I think
of sbuf(9) as a kernel facility that happens to be useable in
user-space as well.  The patch I provided doesn't have drains for
user-space simply because the existing printf(3) implementation did
not seem like it was workable as-is, and would require some
modifications to work as intended.  I don't have the time now to look
into that, and I'm completely unfamiliar with any details of the
user-space printf(3) implementation except what I learned in an hour
of looking over the code.

As for being used outside FreeBSD, I was unaware of this.  Who is
using it, and what kinds of code are we pulling from the other
implementations?

> The drain function does not, and should not have to know that sbufs
> are involved

Why not?  This is an assertion without any proof.  Allowing the drain
to receive the sbuf pointer as an argument allows for future
flexibility I haven't even imagined yet.

> it should have a prototype of:
>
>        int sbuf_drain(void *private, const char *ptr, int len)
>
> Return value:
>        >= 0    # bytes drained
>        -1      error, errno relevante
>        -2      error, errno not relevant

This prototype and return scheme does not work in the kernel where
there is no errno.  Note that in the existing implementation,
sbuf_extend() will return -1 and not set errno in the case where
SBUF_AUTOEXTEND is not set.  sbuf_setpos() will return -1 but not ser
errno to ERANGE when the position argument is out of range.  Similarly
for most of the -1 return codes for sbuf; errno is never explicitly
set in the code and only rarely implicitly set, e.g. by a failed
malloc(3).

We could go a Linux route and a positive value means the number of
bytes drained and negative is the -errno, but that does not seem to be
a FreeBSD style.  Given this, I chose instead to have an
sbuf_drained() function so the drain can explicitly indicate if it has
drained any bytes.

> Note: The the private argument should be a void*, not an uintptr_t

I seem to recall that casting a pointer to uintptr_t and back is safe
according to the C standard, but casting an integral value to a void *
and back is not.  Thus my choice of uintptr_t.  If this is incorrect,
then it doesn't matter and I'm just as happy to use a void *.

> This means that the autoextend can not be implemented in terms of
> a drain, but that looked plain wrong to me any way.

Implementing autoextend as a drain shows off the flexibility of the
drain function, among other things.  It also simplifies the sbuf code
in that it doesn't need to add special cases for s_drain_func == NULL
and call sbuf_extend in those cases.

> The next thing is when do we call the drain function.  If we add
> drain callbacks, we should do it right and offer all three canonical
> options:
>
>        For every char
>        When buffer is full.
>        On line boundaries (ie: one or more lines)

Two of these three are available in my patch.  To be called for every
char, initialize with a buffer of length 2.  Adding a call on line
boundaries is a simple patch that can be added when desired.  That
functionality can be simulated by having the drain function check for
the '\n' and only draining that much.  Line boundaries will never be
perfect since a line can always exceed whatever static sized buffer is
used.

> sbuf_flush() is unecessary, sbuf_finish() should be used, because
> that is the only way you can be sure the error code is available
> for checking afterwards.

With the existing sbuf_finish() implementation, it was unusable.
Mostly because of the SBUF_FINISHED state change.  There is no reason
off-hand that I can think of to have the sbuf track whether it's
really "done"; that is up to the user of the service who should know
whether all the data has been printed to the sbuf.  More below...

> sbuf_unfinished_data() is just plain wrong and shall not be added, ever.

This was required because of the assert_sbuf_state(SBUF_FINISHED)
assert in sbuf_data.  If sbuf_data() always NUL-terminated and didn't
assert on the sbuf state, then sbuf_unfinished_data() would be
unneeded and I'd prefer this.

> The internals of sbufs are strictly private and should stay private.
>
> Sbuf does not make any promises about how the string is stored
> internally until you sbuf_finish() the buffer.

sbuf did make this promise.  Since sbuf_data and sbuf_len are public
accessor functions, it seems reasonable to use them in the drain
callback.  I'm not entirely opposed to passing in a char *buf and int
len as well to the drain, but it seemed redundant given the existence
of sbuf_len() and sbuf_data().  The current implementation makes no
guarantees about how the string is stored until sbuf_finish(), but
this can be changed so that sbuf_data does guarantee exactly what the
prototype offers: a pointer to a NUL-terminated, contiguous sequence
of bytes that represents the contents of the sbuf, and the invariant
can become that the state of the data isn't guaranteed until a call to
sbuf_data().

So to summarize: you have made some objections but in general I don't
feel that you have provided sufficient reasoning behind those
objections.  So I would like to continue this conversation and work
out something that is acceptable to both of us (and the silent
gallery).

Thanks,
matthew