Re: sed in CURRENT fails in textproc/jq

From: Alexander Leidinger <Alexander_at_Leidinger.net>
Date: Mon, 11 Sep 2023 06:26:16 UTC
Am 2023-09-10 18:53, schrieb Robert Clausecker:
> Hi Warner,
> 
> Thank you for your response.
> 
> Am Sun, Sep 10, 2023 at 09:53:03AM -0600 schrieb Warner Losh:
>> On Sun, Sep 10, 2023, 7:36 AM Robert Clausecker <fuz@fuz.su> wrote:
>> 
>> > Hi Warner,
>> >
>> > I have pushed a fix.  It should hopefully address those failing tests.
>> > The same issue should also affect memcmp(), but unlike for memchr(), it is
>> > illegal to pass a length to memcmp() that extends past the actual end of
>> > the buffer as memcmp() is permitted to examine the whole buffer regardless
>> > of where the first mismatch is.
>> >
>> > I am considering a change to improve the behaviour of memcmp() on such
>> > errorneous inputs.  There are two options: (a) I could change memcmp() the
>> > same way I fixed memchr() and have implausible buffer lengths behave as if
>> > the buffer goes to the end of the address space or (b) I could change
>> > memcmp() to crash loudly if it detects such a case.  I could also
>> > (c) leave memcmp() as is.  Which of these three choices is preferable?
>> >
>> 
>> What does the standard say? I'm highly skeptical that these corner 
>> cases are
>> UB behavior.
>> 
>> I'd like actual support for this statement, rather than your 
>> conjecture
>> that it's
>> illegal. Even if you can come up with that, preserving the old 
>> behavior is
>> my
>> first choice. Especially since many of these functions aren't well 
>> defined
>> by
>> a standard, but are extensions.
>> 
>> As for memchr,
>> https://pubs.opengroup.org/onlinepubs/009696799/functions/memchr.html
>> has no such permission to examine 'the entire buffer at once' nor any
>> restirction
>> as to the length extending beyond the address space. I'm skeptical of 
>> your
>> reading
>> that it allows one to examine all of [b, b + len), so please explain 
>> where
>> the standard
>> supports reading past the first occurance.
> 
> memchr() in particular is specified to only examine the input until the
> matching character is found (ISO/IEC 9899:2011 § 7.24.5.1):
> 
> ***
> The memchr function locates the first occurrence of c (converted to an
> unsigned char) in the initial n characters (each interpreted as 
> unsigned
> char) of the object pointed to by s. The implementation shall behave as
> if it reads the characters sequentially and stops as soon as a matching
> character is found.
> ***
> 
> Therefore, it appears reasonable that calls with fake buffer lengths
> (e.g. SIZE_MAX, to read until a mismatch occurs) must be supported.
> However, memcmp() has no such language and the text explicitly states
> that the whole buffer is compared (ISO/IEC 9899:2011 § 7.24.4.1):
> 
> ***
> The memcmp function compares the first n characters of the object
> pointed to by s1 to the first n characters of the object pointed to by 
> s2.
> ***
> 
> By omission, this seems to give license to e.g. implement memcmp() like
> timingsafe_memcmp() where it inspects all n characters of both buffers
> and only then gives a result.  So if n is longer than the actual buffer
> (e.g. n == SIZE_MAX), behaviour may not be defined (e.g. there could be
> a crash due to crossing into an unmapped page).
> 
> Thus I have patched memchr() to behave correctly when length SIZE_MAX 
> is
> given (commit b2618b65).  My memcmp() suffers from similarly flawed
> logic and may need to be patched.  However, as the language I cited 
> above
> does not indicate that such usage needs to be supported for memcmp()
> (whereas it must be for memchr(), contrary to my assumptions), I was
> asking you for how to proceed with memcmp (hence choices (a)--(c)).

My 2ct:
What did the previous implementation of memcmp() do in this case?
  - If it was generous and behaved similar to the requirements of
    memchr(), POLA requires to have the same now too.
  - If it was crashing or silently going on (= lurking bugs in 3rd
    party code), we may have the possibility to do a coredump in case
    of running past the end of the buffer to prevent malicous use.
  - In general I go with the robustness principle, "be liberal what you
    accept, but strict in what you provide" = memcmp() should behave
    as if it is supported.

Bye,
Alexander.

-- 
http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF