[Bug 223532] egrep -i is terrible slow if utf-8 locale is enabled

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Nov 8 12:59:43 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223532

            Bug ID: 223532
           Summary: egrep -i is terrible slow if utf-8 locale is enabled
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: wosch at FreeBSD.org

egrep -i is terrible slow if the locale is set to utf-8. In fact, it is 77
times slower then a case sensitive search.


How to repeat:

First, we create a 100MB text file:
for i in $(seq 1 20);do man tcsh;done > /tmp/tcsh20;
for i in $(seq 1 20); do cat /tmp/tcsh20;done > /tmp/tcsh400

$ du -hs /tmp/tcsh400
 99M    /tmp/tcsh400


# case sensitive search with utf-8
LANG=en_CA.UTF-8 time egrep  -c foobar /tmp/tcsh400
0
        0.11 real         0.06 user         0.04 sys


# case in-sensitive search with utf-8, terrible slow
LANG=en_CA.UTF-8 time egrep  -ic  foobar /tmp/tcsh400
0
        8.47 real         8.42 user         0.04 sys


# case sensitive search with ASCII
LANG=C time egrep  -c  foobar /tmp/tcsh400
0
        0.10 real         0.06 user         0.03 sys


# case in-sensitive search with ASCII
LANG=C time egrep  -ic foobar /tmp/tcsh400
0
        0.10 real         0.07 user         0.03 sys

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list