Re: sed in CURRENT fails in textproc/jq

From: Robert Clausecker <fuz_at_fuz.su>
Date: Sun, 10 Sep 2023 16:53:46 UTC
Hi Warner,

Thank you for your response.

Am Sun, Sep 10, 2023 at 09:53:03AM -0600 schrieb Warner Losh:
> On Sun, Sep 10, 2023, 7:36 AM Robert Clausecker <fuz@fuz.su> wrote:
> 
> > Hi Warner,
> >
> > I have pushed a fix.  It should hopefully address those failing tests.
> > The same issue should also affect memcmp(), but unlike for memchr(), it is
> > illegal to pass a length to memcmp() that extends past the actual end of
> > the buffer as memcmp() is permitted to examine the whole buffer regardless
> > of where the first mismatch is.
> >
> > I am considering a change to improve the behaviour of memcmp() on such
> > errorneous inputs.  There are two options: (a) I could change memcmp() the
> > same way I fixed memchr() and have implausible buffer lengths behave as if
> > the buffer goes to the end of the address space or (b) I could change
> > memcmp() to crash loudly if it detects such a case.  I could also
> > (c) leave memcmp() as is.  Which of these three choices is preferable?
> >
> 
> What does the standard say? I'm highly skeptical that these corner cases are
> UB behavior.
> 
> I'd like actual support for this statement, rather than your conjecture
> that it's
> illegal. Even if you can come up with that, preserving the old behavior is
> my
> first choice. Especially since many of these functions aren't well defined
> by
> a standard, but are extensions.
> 
> As for memchr,
> https://pubs.opengroup.org/onlinepubs/009696799/functions/memchr.html
> has no such permission to examine 'the entire buffer at once' nor any
> restirction
> as to the length extending beyond the address space. I'm skeptical of your
> reading
> that it allows one to examine all of [b, b + len), so please explain where
> the standard
> supports reading past the first occurance.

memchr() in particular is specified to only examine the input until the
matching character is found (ISO/IEC 9899:2011 § 7.24.5.1):

***
The memchr function locates the first occurrence of c (converted to an
unsigned char) in the initial n characters (each interpreted as unsigned
char) of the object pointed to by s. The implementation shall behave as
if it reads the characters sequentially and stops as soon as a matching
character is found.
***

Therefore, it appears reasonable that calls with fake buffer lengths
(e.g. SIZE_MAX, to read until a mismatch occurs) must be supported.
However, memcmp() has no such language and the text explicitly states
that the whole buffer is compared (ISO/IEC 9899:2011 § 7.24.4.1):

***
The memcmp function compares the first n characters of the object
pointed to by s1 to the first n characters of the object pointed to by s2.
***

By omission, this seems to give license to e.g. implement memcmp() like
timingsafe_memcmp() where it inspects all n characters of both buffers
and only then gives a result.  So if n is longer than the actual buffer
(e.g. n == SIZE_MAX), behaviour may not be defined (e.g. there could be
a crash due to crossing into an unmapped page).

Thus I have patched memchr() to behave correctly when length SIZE_MAX is
given (commit b2618b65).  My memcmp() suffers from similarly flawed
logic and may need to be patched.  However, as the language I cited above
does not indicate that such usage needs to be supported for memcmp()
(whereas it must be for memchr(), contrary to my assumptions), I was
asking you for how to proceed with memcmp (hence choices (a)--(c)).

> Also, all the tests that started failing with your commit, available here,
> 
> https://ci.freebsd.org/job/FreeBSD-main-amd64-test/24136/
> 
> should be fixed at a bare minimum. Yes, there's a lot of other tests
> that are failing: those should be fixed or disabled also. Have these
> tests been fixed? They are showing up as failed still in the latest run
> 
> https://ci.freebsd.org/job/FreeBSD-main-amd64-test/24141/
>
> shows them still failing by my reading.

This run crashed kyua due to an issue that seems to be an instance of
bug #273481 and no final report was generated.  However, it seems like the
unit tests in question do not fail anymore.  Let's wait for build 24142.

> 
> Also, we're several weeks in, and we're still fixing basic, fundamental
> functions that show breakage. This indicates to me that proper care
> hasn't been taken to replace the critical functions in the tree. It also
> indicates to me that a stronger level of rigor is needed given that the
> problems are trickling in every few days to weeks. This suggests that
> the unusual step of backing all this work out of stable/14 and releng/14.0
> would be indicated until we can go through this process to the point
> that the new bugs have stopped (which given the decay rate seen so
> far suggests we won't be done before 14.0 is released). i know this is
> an enormous pain, but I think the track record so far supports backing
> these out of the release we're about to do. They just arrived too late
> for the normal 'cooking' process of FreeBSD to sufficiently cook them
> in time for 14.0.

I understand your concerns and will prepare commits to back out the SIMD
work from releng/14.0 and stable/14.  I apologise for my insufficient
testing.  I hope I can improve my testing for upcoming changes.

> Warner
> 
> Yours,
> > Robert Clausecker
> >
> > Am Sun, Sep 10, 2023 at 05:51:43AM -0600 schrieb Warner Losh:
> > > On Sat, Sep 9, 2023, 10:51 PM Robert Clausecker <fuz@fuz.su> wrote:
> > >
> > > > Greetings,
> > > >
> > > > I apologise for the inconvenience.  The issue seems to boil down to
> > > > various places calling
> > > >
> > > >     memchr(buf, c, SIZE_MAX);
> > > >
> > > > which causes an overflow when my newly written memchr() computes buf +
> > > > len to find the end of the buffer.  A patch to alleviate this issue can
> > > > be found here:
> > > >
> > > >
> > > >
> > http://fuz.su/~fuz/freebsd/0001-lib-libc-amd64-string-memchr.S-fix-behaviour-with-ov.patch
> > > >
> > > > Please check if it does the trick for you.  If yes, I'll go ahead and
> > > > push it tomorrow-ish.
> > > >
> > >
> > >
> > > There are half a dozen or do kyua tests that are likely failing because
> > of
> > > this or other reasons related to strings.  When you push this fix you'll
> > > get the list. They are hidden among about 80 or so networking tests that
> > > fail. I plan on disabling those tests soon If no one fixes them.
> > >
> > > Warner
> > >
> > > >
> > > > Yours,
> > > > Robert Clausecker
> > > >
> > > > Am Sat, Sep 09, 2023 at 07:12:29PM +0200 schrieb Dag-Erling Smørgrav:
> > > > > Antoine Brodin <antoine@freebsd.org> writes:
> > > > > > Yuri <yuri@freebsd.org> writes:
> > > > > > > Either something has changed in sed(1) in CURRENT, or sed just
> > fails
> > > > > > > during the configure stage of textproc/jq:
> > > > > > >
> > > > > > > sed: No error: 0
> > > > > > > checking for sys/cygwin.h... eval: ${+...}: Bad substitution
> > > > > > This seems to be a recent issue (less than 5 days).
> > > > > > Hundreds of configure scripts now fail to run on 15-current due to
> > > > > > this sed failure: [...]
> > > > >
> > > > > Try adding ARCHLEVEL=scalar to CONFIGURE_ENV on one of these.  If
> > that
> > > > > helps, yell at fuz@ :)
> > > > >
> > > > > DES
> > > > > --
> > > > > Dag-Erling Smørgrav - des@FreeBSD.org
> > > > >
> > > >
> > > > --
> > > > ()  ascii ribbon campaign - for an 8-bit clean world
> > > > /\  - against html email  - against proprietary attachments
> > > >
> > > >
> >
> > --
> > ()  ascii ribbon campaign - for an 8-bit clean world
> > /\  - against html email  - against proprietary attachments
> >

-- 
()  ascii ribbon campaign - for an 8-bit clean world 
/\  - against html email  - against proprietary attachments