[Bug 223532] GNU egrep -i is terrible slow if utf-8 locale is enabled
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 02 Jun 2021 20:19:55 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223532 --- Comment #8 from Stefan Eßer <se@FreeBSD.org> --- (In reply to Helge Oldach from comment #5) My comment #4 referred to the commengt #3, which used BSD fgrep (despite the title of the PR referring to GNU egrep). I have first compared fgrep with C or UTF-8 locale and found they had about the same performance. Adding -i in the UTF-8 case increased the run time from 0.03 seconds to 4.47 seconds (or by a factor of more than 100). With LANG=C the run time is 3.36 seconds, BTW. The patch that I have attached speeds this case up to 0.09 seconds by using an internal function instead of the regex library. fgrep-FBSD meant fgrep-ORIG (sorry for the confusion). This is the binary as built in -CURRENT without the patch. WITH_INTERNAL_NOSPEC is not documented, except for by a comment in the sources (in util.c) which explains that this option exists for systems that lack REG_NOSPEC or REG_LITERAL and specifically mentions libgnuregex. In fact, this function has a bit more overhead than necessary. An optimized variant of the strcsasestr_l() function could be inlined in util.c, but I did not try to measure the performance difference. (The optimization would cache the locale instead of calling __getlocale() and FIX_LOCALE for each invocation of strcasestr().) -- You are receiving this mail because: You are the assignee for the bug.