From nobody Wed Jun 02 20:19:55 2021 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6EC61EFF4AF for ; Wed, 2 Jun 2021 20:19:55 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FwL3v2d1Pz4jN7 for ; Wed, 2 Jun 2021 20:19:55 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 3C0751D16A for ; Wed, 2 Jun 2021 20:19:55 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 152KJtSC005675 for ; Wed, 2 Jun 2021 20:19:55 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 152KJtd4005674 for bugs@FreeBSD.org; Wed, 2 Jun 2021 20:19:55 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 223532] GNU egrep -i is terrible slow if utf-8 locale is enabled Date: Wed, 02 Jun 2021 20:19:55 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: se@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D223532 --- Comment #8 from Stefan E=C3=9Fer --- (In reply to Helge Oldach from comment #5) My comment #4 referred to the commengt #3, which used BSD fgrep (despite the title of the PR referring to GNU egrep). I have first compared fgrep with C or UTF-8 locale and found they had about= the same performance. Adding -i in the UTF-8 case increased the run time from 0.03 seconds to 4.47 seconds (or by a factor of more than 100). With LANG=3DC the run time is 3.= 36 seconds, BTW. The patch that I have attached speeds this case up to 0.09 seconds by using= an internal function instead of the regex library. fgrep-FBSD meant fgrep-ORIG (sorry for the confusion). This is the binary as built in -CURRENT without the patch. WITH_INTERNAL_NOSPEC is not documented, except for by a comment in the sour= ces (in util.c) which explains that this option exists for systems that lack REG_NOSPEC or REG_LITERAL and specifically mentions libgnuregex. In fact, this function has a bit more overhead than necessary. An optimized variant of the strcsasestr_l() function could be inlined in util.c, but I d= id not try to measure the performance difference. (The optimization would cache the locale instead of calling __getlocale() and FIX_LOCALE for each invocat= ion of strcasestr().) --=20 You are receiving this mail because: You are the assignee for the bug.=