Re: sed in CURRENT fails in textproc/jq
- In reply to: Robert Clausecker : "Re: sed in CURRENT fails in textproc/jq"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 11 Sep 2023 06:26:16 UTC
Am 2023-09-10 18:53, schrieb Robert Clausecker: > Hi Warner, > > Thank you for your response. > > Am Sun, Sep 10, 2023 at 09:53:03AM -0600 schrieb Warner Losh: >> On Sun, Sep 10, 2023, 7:36 AM Robert Clausecker <fuz@fuz.su> wrote: >> >> > Hi Warner, >> > >> > I have pushed a fix. It should hopefully address those failing tests. >> > The same issue should also affect memcmp(), but unlike for memchr(), it is >> > illegal to pass a length to memcmp() that extends past the actual end of >> > the buffer as memcmp() is permitted to examine the whole buffer regardless >> > of where the first mismatch is. >> > >> > I am considering a change to improve the behaviour of memcmp() on such >> > errorneous inputs. There are two options: (a) I could change memcmp() the >> > same way I fixed memchr() and have implausible buffer lengths behave as if >> > the buffer goes to the end of the address space or (b) I could change >> > memcmp() to crash loudly if it detects such a case. I could also >> > (c) leave memcmp() as is. Which of these three choices is preferable? >> > >> >> What does the standard say? I'm highly skeptical that these corner >> cases are >> UB behavior. >> >> I'd like actual support for this statement, rather than your >> conjecture >> that it's >> illegal. Even if you can come up with that, preserving the old >> behavior is >> my >> first choice. Especially since many of these functions aren't well >> defined >> by >> a standard, but are extensions. >> >> As for memchr, >> https://pubs.opengroup.org/onlinepubs/009696799/functions/memchr.html >> has no such permission to examine 'the entire buffer at once' nor any >> restirction >> as to the length extending beyond the address space. I'm skeptical of >> your >> reading >> that it allows one to examine all of [b, b + len), so please explain >> where >> the standard >> supports reading past the first occurance. > > memchr() in particular is specified to only examine the input until the > matching character is found (ISO/IEC 9899:2011 § 7.24.5.1): > > *** > The memchr function locates the first occurrence of c (converted to an > unsigned char) in the initial n characters (each interpreted as > unsigned > char) of the object pointed to by s. The implementation shall behave as > if it reads the characters sequentially and stops as soon as a matching > character is found. > *** > > Therefore, it appears reasonable that calls with fake buffer lengths > (e.g. SIZE_MAX, to read until a mismatch occurs) must be supported. > However, memcmp() has no such language and the text explicitly states > that the whole buffer is compared (ISO/IEC 9899:2011 § 7.24.4.1): > > *** > The memcmp function compares the first n characters of the object > pointed to by s1 to the first n characters of the object pointed to by > s2. > *** > > By omission, this seems to give license to e.g. implement memcmp() like > timingsafe_memcmp() where it inspects all n characters of both buffers > and only then gives a result. So if n is longer than the actual buffer > (e.g. n == SIZE_MAX), behaviour may not be defined (e.g. there could be > a crash due to crossing into an unmapped page). > > Thus I have patched memchr() to behave correctly when length SIZE_MAX > is > given (commit b2618b65). My memcmp() suffers from similarly flawed > logic and may need to be patched. However, as the language I cited > above > does not indicate that such usage needs to be supported for memcmp() > (whereas it must be for memchr(), contrary to my assumptions), I was > asking you for how to proceed with memcmp (hence choices (a)--(c)). My 2ct: What did the previous implementation of memcmp() do in this case? - If it was generous and behaved similar to the requirements of memchr(), POLA requires to have the same now too. - If it was crashing or silently going on (= lurking bugs in 3rd party code), we may have the possibility to do a coredump in case of running past the end of the buffer to prevent malicous use. - In general I go with the robustness principle, "be liberal what you accept, but strict in what you provide" = memcmp() should behave as if it is supported. Bye, Alexander. -- http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF