bin/71367: regex multibyte support is really slow
Kuang-che Wu
kcwu at csie.org
Sat Sep 4 02:40:10 PDT 2004
>Number: 71367
>Category: bin
>Synopsis: regex multibyte support is really slow
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sat Sep 04 09:40:09 GMT 2004
>Closed-Date:
>Last-Modified:
>Originator: Kuang-che Wu
>Release: FreeBSD 6.0-CURRENT i386
>Organization:
>Environment:
System: FreeBSD kcwu.homeip.net 6.0-CURRENT FreeBSD 6.0-CURRENT #0: Sat Sep 4 05:33:38 CST 2004 root at kcwu.homeip.net:/usr/obj/usr/src/sys/DESKTOP i386
CPU: AMD Athlon(tm) XP 2000+ (1665.59-MHz 686-class CPU)
>Description:
regex in UTF-8 locale
+ flag REG_EXTENDED|REG_ICASE
+ pattern [[:alnum:]]
= unacceptable slow
>How-To-Repeat:
$ cc -O -pipe re.c -o re
$ time ./re
7.65 real 7.51 user 0.06 sys
#include <stdio.h>
#include <locale.h>
#include <regex.h>
int main(void)
{
regex_t re;
char string[1024]={
#define WORD 0xe6,0x85,0xa2 /* UTF-8 character */
WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD,
WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD,
0
};
if(setlocale(LC_CTYPE,"zh_TW.UTF-8")==NULL)
return 1;
if(regcomp(&re,"[[:alnum:]]",REG_EXTENDED|REG_ICASE)!=0)
return 2;
if(regexec(&re,string,0,NULL,0)==0)
printf("matched\n");
return 0;
}
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list