Extending sbufs with a drain

Wed Sep 8 17:10:08 UTC 2010

On Wed, Sep 8, 2010 at 9:29 AM, Poul-Henning Kamp <phk at phk.freebsd.dk> wrote:
> In message <AANLkTikerOYdEyT6jt7rAN3JP_N_oofq7ZXc4NptA4zP at mail.gmail.com>, mdf@
> FreeBSD.org writes:
>
>>Regardless of what sbufs were originally designed to do, this seems
>>like a useful enhancement to me that does not take away the ability to
>>use them for overflow detection.  An sbuf without a drain marked
>>SBUF_FIXED will still perform in this manner.  But now there are more
>>options.
>
> I am not saying it is not useful, I am questioning if sbufs is the
> right vehicle for this usefulness.

I chose to put it in sbuf because it's essentially impossible to put
it in a structure that uses sbuf for its implementation.  In order to
know that draining is possible, one needs to know before the buffer
has overflowed.

If you think sbuf isn't the right vehicle (and I can understand that
argument) can you please suggest something else?  In the end I would
like a One True Printf to bind them all, and printf(), sprintf(),
fprintf(), log(), console(), etc., are all implemented using the One
True Printf.  Allowing the attach of a generic drain function aids
this goal, because the functions that print data structure internals
or whatever only need to know to print to an sbuf (or a struct format,
if you prefer), and the user who initialized the struct knows where
she intended to print.

The unbuffered kernel kvprintf(9) makes implementing this easy.  As I
said, I haven't figured out how to implement this with printf(3) but I
can at least see how to implement buffered print in userspace with
sbuf at the core.

>>> For one thing, sbuf is not a kernel facility, it is a general purpose
>>> string editing library which is also used outside FreeBSD.   If
>>> we add the drain facility it should work in both userland and kernel.
>>
>>Given that sys/sbuf.h is included in 100+ kernel files and 4
>>user-space files, [...]
>
> That is probably more an indication that our userland is largely
> unchanged from 1995, whereas as everybody loves to hack up the
> kernel :-)  The various people who have told me they adopted sbufs
> have been very happy with them in userland.
>
>>As for being used outside FreeBSD, I was unaware of this.  Who is
>>using it, and what kinds of code are we pulling from the other
>>implementations?
>
> As usual with the BSD license, we have no idea and no way to find out.
>
> One place I do know about is Varnish :-)
>
> My main point here is that you should not treat sbufs as a kernel
> facility, but see it as a generally desirable and useful way of
> doing safe string operations, and that if we make a major extension
> in functionality, it should be done both in userland and kernel.

I don't object to adding drains to userland.  I want to.  I merely
stated that, at the moment, given the implementation of sprintf(3)
through __vfprintf() and FAKE_FILE, it is not possible.  That
implementation would need to be changed and it is beyond the scope of
my current knowledge and time.  My current patch doesn't have it, but
that doesn't mean the patch is wrong.

I would assume that, if someone really wants this is userland before I
can get to this (and it is on my list of desired things to do) then
they can write the code first.

>>> The drain function does not, and should not have to know that sbufs
>>> are involved
>>
>>Why not?  This is an assertion without any proof.
>
> See later in my previous reply:  The contents of sbufs is private,
> nobody outside subr_sbuf.c should ever fondle the internals.  See
> further below.
>
>>to receive the sbuf pointer as an argument allows for future
>>flexibility I haven't even imagined yet.
>
> I'll let Gettys (The programmer, not the general) shoot that canard down:
>
>   3. The only thing worse than generalizing from one example
>      is generalizing from no examples at all.
>
>>> int sbuf_drain(void *private, const char *ptr, int len)
>>>
>>> Return value:
>>>  >= 0  # bytes drained
>>>        -1      error, errno relevante
>>>        -2      error, errno not relevant
>>
>>This prototype and return scheme does not work in the kernel where
>>there is no errno.
>
> Which bit of "errno not relevant" in the description of -2 did
> you fail to notice ?  :-)

Well, the specific error code is relevant, and I don't really want an
sbuf_set_error() function.  For example, if my drain function is
SYSCTL_OUT, I want to preserve the error code that returned.  If my
drain function is write(2) in userland, I want to preserve errno.  The
error code is very useful, and throwing it away for the kernel is
wrong.

>>Note that in the existing implementation,
>>sbuf_extend() will return -1 and not set errno in the case where
>>SBUF_AUTOEXTEND is not set.
>
> You were the one to involve errno's in this (patch 0001...) I
> don't see them having any relevance, since ENOMEM is the only
> one that's realistically relevant.

Not with a generic drain function.  sbuf_finish() / sbuf_flush()
should be returning the error code, or at least providing a way to
fetch it.

> Realistiscally you could have a drain function that dumps into
> a socket (userland or kernel) and it would be nice to report
> the ECONNRESET back.
>
> I was never really happy with des@' choice of the name sbuf_overflowed()
> I would have prefered sbuf_error() or sbuf_ok().   If we decide to
> introduce errnos, this gives us the chance to add those and advocate
> their use instead of sbuf_overflowed().
>
>>> Note: The the private argument should be a void*, not an uintptr_t
>>
>>I seem to recall that casting a pointer to uintptr_t and back is safe
>
> Yes, but why would you want to ?  Normally private arguments are typed
> "void *" as they more often than not point to some structure, to
> which you can cast a void* directly, whereas uintptr_t requires a
> explicit cast.

If my drain function is in userspace and writes to a file descriptor
I'd want it.

>>> This means that the autoextend can not be implemented in terms of
>>> a drain, but that looked plain wrong to me any way.
>>
>>Implementing autoextend as a drain shows off the flexibility of the
>>drain function, among other things.
>
> Uhm no, it does not.
>
> It results in the wrong prototype for the drain function, exposes
> the sbuf internals where they should not be fondled, and requires
> the addition of two extra functions which are unnecessary with the
> simpler prototype I suggested above.

Autoextend did not dictate the drain prototype.  The autoextend drain
*is* an sbuf internal function -- note its existence in subr_sbuf.c.
There may be other desired uses for a drain that don't just print to a
file/console/log, though those are the most likely.

>>Line boundaries will never be
>>perfect since a line can always exceed whatever static sized buffer is
>>used.
>
> Good point, forget about drain-policy then.
>
> The size 2 buffer for char by char requires good documentation and
> possibly a distinct #define.
>
>>> sbuf_flush() is unecessary, sbuf_finish() should be used, because
>>> that is the only way you can be sure the error code is available
>>> for checking afterwards.
>>
>>With the existing sbuf_finish() implementation, it was unusable.
>>Mostly because of the SBUF_FINISHED state change.  There is no reason
>>off-hand that I can think of to have the sbuf track whether it's
>>really "done"; that is up to the user of the service who should know
>>whether all the data has been printed to the sbuf.  More below...
>
> The critical feature of the sbuf API is that you do not need to
> check the error status on every call.  That is why UNIX grew the
> EPIPE hack: nobody checks the return value of printf().
>
> With sbufs no damage happens, no matter what you do, and you can
> (and should!) check the compound operation at the end with
> sbuf_overflowed().
>
> For a sbuf with a drain function, sbuf_finish() should flush the
> buffer, but it should probably not set the SBUF_FINISHED flag.  That
> would also catch confused calls to sbuf_data() with an assert.
>
> What is the semantics of sbuf_len() when you have a drain function ?
>
> Number of undrained bytes ?

Yes.  If you attach an explicit drain function, the expected use is
that you also don't care about the len, because you are flushing to a
file/pipe/console/log and the only thing to be done after all the
printing is to call sbuf_finish() or sbuf_flush() as I named it.

>>> sbuf_unfinished_data() is just plain wrong and shall not be added, ever.
>>
>>This was required because of the assert_sbuf_state(SBUF_FINISHED)
>
> No, this was required because you chose the wrong prototype for the
> drain function.
>
>>> The internals of sbufs are strictly private and should stay private.
>>>
>>> Sbuf does not make any promises about how the string is stored
>>> internally until you sbuf_finish() the buffer.
>>
>>sbuf did make this promise.  Since sbuf_data and sbuf_len are public
>>accessor functions [...]
>
> Sbuf made no such promise, sbuf_data() and sbuf_len() are there
> precisly to isolate the internals.

Sorry, my wording was confused.  I was meaning to say that, at the
present, sbuf does not make any promises.

> Amongst the original ideas was to have a multipart buffer to avoid
> multiple realloc()/memcpy operations on buffer extends, and only
> collapse the buffer at the end when the final size was known.  This
> optimization has so far not been necessary.  Your API change would
> make it impossible.

Yes, that's a potential change that in fact can still be done.  As I
said, the only change is that the use of sbuf_data() then guarantees
that the data is contiguous, instead of sbuf_finish() guaranteeing
this.  For an sbuf with an explicit drain, the buffer won't be
growing, but will be drained instead.

Thanks,
matthew