From nobody Mon Sep 11 06:26:16 2023 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RkcFk4mSZz4snrZ for ; Mon, 11 Sep 2023 06:27:22 +0000 (UTC) (envelope-from Alexander@Leidinger.net) Received: from mailgate.Leidinger.net (bastille.leidinger.net [89.238.82.207]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature ECDSA (P-256) client-digest SHA256) (Client CN "mailgate.leidinger.net", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RkcFk10Z0z3ff7 for ; Mon, 11 Sep 2023 06:27:22 +0000 (UTC) (envelope-from Alexander@Leidinger.net) Authentication-Results: mx1.freebsd.org; none List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=leidinger.net; s=outgoing-alex; t=1694413628; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=X7ARjl4FetcyT1uu76QnNv+CHpkSCdmNDGJo0fSy1gQ=; b=SlPj0Sis3swQVvXquJxHGSh2UoJIBMW3Y7KY+2Ylhcxu1HQALe7OSGGL1C6ewWGuxPbEYz kL1BDwZF08yp8YsXd+CMniM7qG5En25nXVolgcXU4mcr4B+EClJmaX640pfsF1vluggCHN VNdmVcR9f8YE6ysh/aJDY+p6d3VuQueS50P58AN1t3+jMJqFAjJehfAK4NZLK1XRVyV+88 BwSzWxQd7/lW2uEmVFuMxzjHDw+kfnUYGq/DDo62PQUlzgLVZZV8c1oJ7/SjlA+PrdygRQ uIzWqgKdWxy6lnEqZcabcG2AMwAsPBY9OPlqQsjm+iAxsBw1Ow7QT50HXLya5A== Date: Mon, 11 Sep 2023 08:26:16 +0200 From: Alexander Leidinger To: Robert Clausecker Cc: Warner Losh , current@freebsd.org Subject: Re: sed in CURRENT fails in textproc/jq In-Reply-To: References: <30a59ac0-28fd-f6ed-505c-9ef6d8e84fc3@tsoft.com> <86edj7qnia.fsf@ltc.des.no> Message-ID: <100f27b9bf542630377af4eae5e8c03c@Leidinger.net> X-Sender: Alexander@Leidinger.net Content-Type: multipart/signed; protocol="application/pgp-signature"; boundary="=_373938dfb81f1ff1d5af74c34b9d4759"; micalg=pgp-sha256 X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:34240, ipnet:89.238.64.0/18, country:DE] X-Rspamd-Queue-Id: 4RkcFk10Z0z3ff7 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --=_373938dfb81f1ff1d5af74c34b9d4759 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8; format=flowed Am 2023-09-10 18:53, schrieb Robert Clausecker: > Hi Warner, > > Thank you for your response. > > Am Sun, Sep 10, 2023 at 09:53:03AM -0600 schrieb Warner Losh: >> On Sun, Sep 10, 2023, 7:36 AM Robert Clausecker wrote: >> >> > Hi Warner, >> > >> > I have pushed a fix. It should hopefully address those failing tests. >> > The same issue should also affect memcmp(), but unlike for memchr(), it is >> > illegal to pass a length to memcmp() that extends past the actual end of >> > the buffer as memcmp() is permitted to examine the whole buffer regardless >> > of where the first mismatch is. >> > >> > I am considering a change to improve the behaviour of memcmp() on such >> > errorneous inputs. There are two options: (a) I could change memcmp() the >> > same way I fixed memchr() and have implausible buffer lengths behave as if >> > the buffer goes to the end of the address space or (b) I could change >> > memcmp() to crash loudly if it detects such a case. I could also >> > (c) leave memcmp() as is. Which of these three choices is preferable? >> > >> >> What does the standard say? I'm highly skeptical that these corner >> cases are >> UB behavior. >> >> I'd like actual support for this statement, rather than your >> conjecture >> that it's >> illegal. Even if you can come up with that, preserving the old >> behavior is >> my >> first choice. Especially since many of these functions aren't well >> defined >> by >> a standard, but are extensions. >> >> As for memchr, >> https://pubs.opengroup.org/onlinepubs/009696799/functions/memchr.html >> has no such permission to examine 'the entire buffer at once' nor any >> restirction >> as to the length extending beyond the address space. I'm skeptical of >> your >> reading >> that it allows one to examine all of [b, b + len), so please explain >> where >> the standard >> supports reading past the first occurance. > > memchr() in particular is specified to only examine the input until the > matching character is found (ISO/IEC 9899:2011 § 7.24.5.1): > > *** > The memchr function locates the first occurrence of c (converted to an > unsigned char) in the initial n characters (each interpreted as > unsigned > char) of the object pointed to by s. The implementation shall behave as > if it reads the characters sequentially and stops as soon as a matching > character is found. > *** > > Therefore, it appears reasonable that calls with fake buffer lengths > (e.g. SIZE_MAX, to read until a mismatch occurs) must be supported. > However, memcmp() has no such language and the text explicitly states > that the whole buffer is compared (ISO/IEC 9899:2011 § 7.24.4.1): > > *** > The memcmp function compares the first n characters of the object > pointed to by s1 to the first n characters of the object pointed to by > s2. > *** > > By omission, this seems to give license to e.g. implement memcmp() like > timingsafe_memcmp() where it inspects all n characters of both buffers > and only then gives a result. So if n is longer than the actual buffer > (e.g. n == SIZE_MAX), behaviour may not be defined (e.g. there could be > a crash due to crossing into an unmapped page). > > Thus I have patched memchr() to behave correctly when length SIZE_MAX > is > given (commit b2618b65). My memcmp() suffers from similarly flawed > logic and may need to be patched. However, as the language I cited > above > does not indicate that such usage needs to be supported for memcmp() > (whereas it must be for memchr(), contrary to my assumptions), I was > asking you for how to proceed with memcmp (hence choices (a)--(c)). My 2ct: What did the previous implementation of memcmp() do in this case? - If it was generous and behaved similar to the requirements of memchr(), POLA requires to have the same now too. - If it was crashing or silently going on (= lurking bugs in 3rd party code), we may have the possibility to do a coredump in case of running past the end of the buffer to prevent malicous use. - In general I go with the robustness principle, "be liberal what you accept, but strict in what you provide" = memcmp() should behave as if it is supported. Bye, Alexander. -- http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF --=_373938dfb81f1ff1d5af74c34b9d4759 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc; size=833 Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEER9UlYXp1PSd08nWXEg2wmwP42IYFAmT+sxcACgkQEg2wmwP4 2Ib0rRAAgxhAKIefJK9TIYgnC0sIoCH9WJePhu5fV7NIP/dBPrqSrZGZcNpznDQN YSgF6iQ1JNhaKRlKUE5BKdigR2J4ZB/jNbMabloaa2EoMzi6+z6JuQNe7Q2Zxikj 2JWMVSktzWYIsQoLcr0TWIvXTkd0oJ9i28xDyrAFSgPXBwqMNl4miFq5/m5ZfF8q Co56E5y+xL66wXv2Htnltc22Gv959ajD+bVyTOgTMGbLIq8nC//LTyMQDV1LbjGR gMcRYeqMqcHUV+CB1FQBfGS/oOopxhJ9EbDdJvfnS7SSbFtkXxRntKuLFS+KOqi/ FRcmMFvI7FxSNebTa8l2jBADTpbV5q3B8CezqYggLmmCa8M9wo9iZJgeFSLsQf+F QrppvHLASMV55KQUVPuQJt02fqM1XKl8bHKcAznnxGo46tVSiYomHAzHAfDqKGaF FoQI1ok0FH8o7TOG5dcJW73CXL8mBrHoaBT/vjC+cCYslOcE4zD2ao9fpzYcfoL4 KEyORt9XJoRAA7OBtvUILHRGN0CgRrG9wDDb22i/Romvhg2mFK1iIT9kA1LyGeZR 3sfbqTVNdXqmeGf1uU6/YMUTefiPYmV7Uj1FFmOLwsI4LM/njNitslZB+ddkFPHZ zUdGl1sicdD+t6Cj7+Jr5/PE43coFpB3ca1kdnZVlayOtc4u+9A= =S4xN -----END PGP SIGNATURE----- --=_373938dfb81f1ff1d5af74c34b9d4759--