From nobody Wed May 19 14:59:46 2021 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 5D2CB8452F7 for ; Wed, 19 May 2021 14:59:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Flbcy26Mrz3qm2 for ; Wed, 19 May 2021 14:59:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 30BF9149C0 for ; Wed, 19 May 2021 14:59:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 14JExk99080475 for ; Wed, 19 May 2021 14:59:46 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 14JExkUv080474 for bugs@FreeBSD.org; Wed, 19 May 2021 14:59:46 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 243229] awk length() function in base system produces an incorrect results for UTF-8 strings Date: Wed, 19 May 2021 14:59:46 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 12.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: triaxx@NetBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: http://lists.freebsd.org/bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D243229 Fr=C3=A9d=C3=A9ric Fauberteau changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |triaxx@NetBSD.org --- Comment #2 from Fr=C3=A9d=C3=A9ric Fauberteau --- I don't know if this issue is related to that bug report, but the following command prints 'bin': % echo "bin" | LANG=3Den_US awk '$1 ~ /^[\t -~]/ {print $0}' while this one prints nothing: echo "bin" | LANG=3Den_US.UTF-8 awk '$1 ~ /^[\t -~]/ {print $0}' The range from ' ' to '~' includes alphabetical characters when the locale = is not utf-8 but does not when the locale is utf-8. We can notice that '/^[\t -~]/' matches "bin" with C.UTF-8. --=20 You are receiving this mail because: You are the assignee for the bug.=