bin/71367: regex multibyte support is really slow

Kuang-che Wu kcwu at csie.org
Sat Sep 4 02:40:10 PDT 2004


>Number:         71367
>Category:       bin
>Synopsis:       regex multibyte support is really slow
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Sep 04 09:40:09 GMT 2004
>Closed-Date:
>Last-Modified:
>Originator:     Kuang-che Wu
>Release:        FreeBSD 6.0-CURRENT i386
>Organization:
>Environment:
System: FreeBSD kcwu.homeip.net 6.0-CURRENT FreeBSD 6.0-CURRENT #0: Sat Sep 4 05:33:38 CST 2004 root at kcwu.homeip.net:/usr/obj/usr/src/sys/DESKTOP i386

CPU: AMD Athlon(tm) XP 2000+ (1665.59-MHz 686-class CPU)

	
>Description:
	regex in UTF-8 locale
	+ flag REG_EXTENDED|REG_ICASE
	+ pattern [[:alnum:]]
	= unacceptable slow
	
>How-To-Repeat:
	$ cc -O -pipe   re.c  -o re
	$ time ./re
	        7.65 real         7.51 user         0.06 sys

#include <stdio.h>
#include <locale.h>
#include <regex.h>

int main(void)
{
  regex_t re;
  char string[1024]={
#define WORD 0xe6,0x85,0xa2 /* UTF-8 character */
    WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD,
    WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD,
    0
  };

  if(setlocale(LC_CTYPE,"zh_TW.UTF-8")==NULL)
    return 1;

  if(regcomp(&re,"[[:alnum:]]",REG_EXTENDED|REG_ICASE)!=0)
    return 2;
  if(regexec(&re,string,0,NULL,0)==0)
    printf("matched\n");

  return 0;
}
	
>Fix:

	


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list