ports/71790: devel/icu2: add koi8-u converter to standard data library

Andriy Gapon avg at icyb.net.ua
Thu Sep 16 10:40:26 UTC 2004


>Number:         71790
>Category:       ports
>Synopsis:       devel/icu2: add koi8-u converter to standard data library
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-ports-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Sep 16 10:40:24 GMT 2004
>Closed-Date:
>Last-Modified:
>Originator:     Andriy Gapon
>Release:        FreeBSD 5.2.1-RELEASE-p9 i386
>Organization:
>Environment:
System: FreeBSD 5.2.1-RELEASE-p9 i386
icu2-2.8
>Description:
FreeBSD supports locales with KOI8-U charset, but it is not
supported by ICU standard data library, thus some applications using
ICU do not work properly out-of-box in environments with KOI8-U charset.
>How-To-Repeat:
E.g.:
1. build gtk-gnutella "with ICU" (I think it is default and the package is built so)
2. set your CHARSET environment variable KOI8-U
3. start the program and enter any search query you'd like (in ASCII)
4. see that no search results appear no matter how long you wait
(this is because conversion from KOI8-U to unicode silently fails and gtk-gnutella
simply ignores query)
>Fix:
the patch below adds KOI8-U converter to ICU standard data library.
.ucm file is borrowed from perl.

--- patch-koi8-u begins here ---
--- source/data/mappings/koi8-u.ucm.orig	Thu Sep 16 13:14:47 2004
+++ source/data/mappings/koi8-u.ucm	Wed Sep 15 18:49:14 2004
@@ -0,0 +1,272 @@
+#
+# $Id: koi8-u.ucm,v 2.0 2004/05/16 20:55:26 dankogai Exp $
+#
+# Written $Id: koi8-u.ucm,v 2.0 2004/05/16 20:55:26 dankogai Exp $
+# ./compile -n koi8-u -o Encode/koi8-u.ucm Encode/koi8-u.enc
+
+<code_set_name> "koi8-u"
+<char_name_mask>              "AXXXX"
+<mb_cur_min> 1
+<mb_cur_max> 1
+<uconv_class>                 "SBCS"
+<subchar>                     \x1A
+<icu:charsetFamily>           "ASCII"
+
+CHARMAP
+<U0000> \x00 |0 # <control>
+<U0001> \x01 |0 # <control>
+<U0002> \x02 |0 # <control>
+<U0003> \x03 |0 # <control>
+<U0004> \x04 |0 # <control>
+<U0005> \x05 |0 # <control>
+<U0006> \x06 |0 # <control>
+<U0007> \x07 |0 # <control>
+<U0008> \x08 |0 # <control>
+<U0009> \x09 |0 # <control>
+<U000A> \x0A |0 # <control>
+<U000B> \x0B |0 # <control>
+<U000C> \x0C |0 # <control>
+<U000D> \x0D |0 # <control>
+<U000E> \x0E |0 # <control>
+<U000F> \x0F |0 # <control>
+<U0010> \x10 |0 # <control>
+<U0011> \x11 |0 # <control>
+<U0012> \x12 |0 # <control>
+<U0013> \x13 |0 # <control>
+<U0014> \x14 |0 # <control>
+<U0015> \x15 |0 # <control>
+<U0016> \x16 |0 # <control>
+<U0017> \x17 |0 # <control>
+<U0018> \x18 |0 # <control>
+<U0019> \x19 |0 # <control>
+<U001A> \x1A |0 # <control>
+<U001B> \x1B |0 # <control>
+<U001C> \x1C |0 # <control>
+<U001D> \x1D |0 # <control>
+<U001E> \x1E |0 # <control>
+<U001F> \x1F |0 # <control>
+<U0020> \x20 |0 # SPACE
+<U0021> \x21 |0 # EXCLAMATION MARK
+<U0022> \x22 |0 # QUOTATION MARK
+<U0023> \x23 |0 # NUMBER SIGN
+<U0024> \x24 |0 # DOLLAR SIGN
+<U0025> \x25 |0 # PERCENT SIGN
+<U0026> \x26 |0 # AMPERSAND
+<U0027> \x27 |0 # APOSTROPHE
+<U0028> \x28 |0 # LEFT PARENTHESIS
+<U0029> \x29 |0 # RIGHT PARENTHESIS
+<U002A> \x2A |0 # ASTERISK
+<U002B> \x2B |0 # PLUS SIGN
+<U002C> \x2C |0 # COMMA
+<U002D> \x2D |0 # HYPHEN-MINUS
+<U002E> \x2E |0 # FULL STOP
+<U002F> \x2F |0 # SOLIDUS
+<U0030> \x30 |0 # DIGIT ZERO
+<U0031> \x31 |0 # DIGIT ONE
+<U0032> \x32 |0 # DIGIT TWO
+<U0033> \x33 |0 # DIGIT THREE
+<U0034> \x34 |0 # DIGIT FOUR
+<U0035> \x35 |0 # DIGIT FIVE
+<U0036> \x36 |0 # DIGIT SIX
+<U0037> \x37 |0 # DIGIT SEVEN
+<U0038> \x38 |0 # DIGIT EIGHT
+<U0039> \x39 |0 # DIGIT NINE
+<U003A> \x3A |0 # COLON
+<U003B> \x3B |0 # SEMICOLON
+<U003C> \x3C |0 # LESS-THAN SIGN
+<U003D> \x3D |0 # EQUALS SIGN
+<U003E> \x3E |0 # GREATER-THAN SIGN
+<U003F> \x3F |0 # QUESTION MARK
+<U0040> \x40 |0 # COMMERCIAL AT
+<U0041> \x41 |0 # LATIN CAPITAL LETTER A
+<U0042> \x42 |0 # LATIN CAPITAL LETTER B
+<U0043> \x43 |0 # LATIN CAPITAL LETTER C
+<U0044> \x44 |0 # LATIN CAPITAL LETTER D
+<U0045> \x45 |0 # LATIN CAPITAL LETTER E
+<U0046> \x46 |0 # LATIN CAPITAL LETTER F
+<U0047> \x47 |0 # LATIN CAPITAL LETTER G
+<U0048> \x48 |0 # LATIN CAPITAL LETTER H
+<U0049> \x49 |0 # LATIN CAPITAL LETTER I
+<U004A> \x4A |0 # LATIN CAPITAL LETTER J
+<U004B> \x4B |0 # LATIN CAPITAL LETTER K
+<U004C> \x4C |0 # LATIN CAPITAL LETTER L
+<U004D> \x4D |0 # LATIN CAPITAL LETTER M
+<U004E> \x4E |0 # LATIN CAPITAL LETTER N
+<U004F> \x4F |0 # LATIN CAPITAL LETTER O
+<U0050> \x50 |0 # LATIN CAPITAL LETTER P
+<U0051> \x51 |0 # LATIN CAPITAL LETTER Q
+<U0052> \x52 |0 # LATIN CAPITAL LETTER R
+<U0053> \x53 |0 # LATIN CAPITAL LETTER S
+<U0054> \x54 |0 # LATIN CAPITAL LETTER T
+<U0055> \x55 |0 # LATIN CAPITAL LETTER U
+<U0056> \x56 |0 # LATIN CAPITAL LETTER V
+<U0057> \x57 |0 # LATIN CAPITAL LETTER W
+<U0058> \x58 |0 # LATIN CAPITAL LETTER X
+<U0059> \x59 |0 # LATIN CAPITAL LETTER Y
+<U005A> \x5A |0 # LATIN CAPITAL LETTER Z
+<U005B> \x5B |0 # LEFT SQUARE BRACKET
+<U005C> \x5C |0 # REVERSE SOLIDUS
+<U005D> \x5D |0 # RIGHT SQUARE BRACKET
+<U005E> \x5E |0 # CIRCUMFLEX ACCENT
+<U005F> \x5F |0 # LOW LINE
+<U0060> \x60 |0 # GRAVE ACCENT
+<U0061> \x61 |0 # LATIN SMALL LETTER A
+<U0062> \x62 |0 # LATIN SMALL LETTER B
+<U0063> \x63 |0 # LATIN SMALL LETTER C
+<U0064> \x64 |0 # LATIN SMALL LETTER D
+<U0065> \x65 |0 # LATIN SMALL LETTER E
+<U0066> \x66 |0 # LATIN SMALL LETTER F
+<U0067> \x67 |0 # LATIN SMALL LETTER G
+<U0068> \x68 |0 # LATIN SMALL LETTER H
+<U0069> \x69 |0 # LATIN SMALL LETTER I
+<U006A> \x6A |0 # LATIN SMALL LETTER J
+<U006B> \x6B |0 # LATIN SMALL LETTER K
+<U006C> \x6C |0 # LATIN SMALL LETTER L
+<U006D> \x6D |0 # LATIN SMALL LETTER M
+<U006E> \x6E |0 # LATIN SMALL LETTER N
+<U006F> \x6F |0 # LATIN SMALL LETTER O
+<U0070> \x70 |0 # LATIN SMALL LETTER P
+<U0071> \x71 |0 # LATIN SMALL LETTER Q
+<U0072> \x72 |0 # LATIN SMALL LETTER R
+<U0073> \x73 |0 # LATIN SMALL LETTER S
+<U0074> \x74 |0 # LATIN SMALL LETTER T
+<U0075> \x75 |0 # LATIN SMALL LETTER U
+<U0076> \x76 |0 # LATIN SMALL LETTER V
+<U0077> \x77 |0 # LATIN SMALL LETTER W
+<U0078> \x78 |0 # LATIN SMALL LETTER X
+<U0079> \x79 |0 # LATIN SMALL LETTER Y
+<U007A> \x7A |0 # LATIN SMALL LETTER Z
+<U007B> \x7B |0 # LEFT CURLY BRACKET
+<U007C> \x7C |0 # VERTICAL LINE
+<U007D> \x7D |0 # RIGHT CURLY BRACKET
+<U007E> \x7E |0 # TILDE
+<U007F> \x7F |0 # <control>
+<U2500> \x80 |0 # BOX DRAWINGS LIGHT HORIZONTAL
+<U2502> \x81 |0 # BOX DRAWINGS LIGHT VERTICAL
+<U250C> \x82 |0 # BOX DRAWINGS LIGHT DOWN AND RIGHT
+<U2510> \x83 |0 # BOX DRAWINGS LIGHT DOWN AND LEFT
+<U2514> \x84 |0 # BOX DRAWINGS LIGHT UP AND RIGHT
+<U2518> \x85 |0 # BOX DRAWINGS LIGHT UP AND LEFT
+<U251C> \x86 |0 # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
+<U2524> \x87 |0 # BOX DRAWINGS LIGHT VERTICAL AND LEFT
+<U252C> \x88 |0 # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
+<U2534> \x89 |0 # BOX DRAWINGS LIGHT UP AND HORIZONTAL
+<U253C> \x8A |0 # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
+<U2580> \x8B |0 # UPPER HALF BLOCK
+<U2584> \x8C |0 # LOWER HALF BLOCK
+<U2588> \x8D |0 # FULL BLOCK
+<U258C> \x8E |0 # LEFT HALF BLOCK
+<U2590> \x8F |0 # RIGHT HALF BLOCK
+<U2591> \x90 |0 # LIGHT SHADE
+<U2592> \x91 |0 # MEDIUM SHADE
+<U2593> \x92 |0 # DARK SHADE
+<U2320> \x93 |0 # TOP HALF INTEGRAL
+<U25A0> \x94 |0 # BLACK SQUARE
+<U2022> \x95 |0 # BULLET
+<U221A> \x96 |0 # SQUARE ROOT
+<U2248> \x97 |0 # ALMOST EQUAL TO
+<U2264> \x98 |0 # LESS-THAN OR EQUAL TO
+<U2265> \x99 |0 # GREATER-THAN OR EQUAL TO
+<U00A0> \x9A |0 # NO-BREAK SPACE
+<U2321> \x9B |0 # BOTTOM HALF INTEGRAL
+<U00B0> \x9C |0 # DEGREE SIGN
+<U00B2> \x9D |0 # SUPERSCRIPT TWO
+<U00B7> \x9E |0 # MIDDLE DOT
+<U00F7> \x9F |0 # DIVISION SIGN
+<U2550> \xA0 |0 # BOX DRAWINGS DOUBLE HORIZONTAL
+<U2551> \xA1 |0 # BOX DRAWINGS DOUBLE VERTICAL
+<U2552> \xA2 |0 # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
+<U0451> \xA3 |0 # CYRILLIC SMALL LETTER IO
+<U0454> \xA4 |0 # CYRILLIC SMALL LETTER UKRAINIAN IE
+<U2554> \xA5 |0 # BOX DRAWINGS DOUBLE DOWN AND RIGHT
+<U0456> \xA6 |0 # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0457> \xA7 |0 # CYRILLIC SMALL LETTER YI
+<U2557> \xA8 |0 # BOX DRAWINGS DOUBLE DOWN AND LEFT
+<U2558> \xA9 |0 # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
+<U2559> \xAA |0 # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
+<U255A> \xAB |0 # BOX DRAWINGS DOUBLE UP AND RIGHT
+<U255B> \xAC |0 # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
+<U0491> \xAD |0 # CYRILLIC SMALL LETTER GHE WITH UPTURN
+<U255D> \xAE |0 # BOX DRAWINGS DOUBLE UP AND LEFT
+<U255E> \xAF |0 # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
+<U255F> \xB0 |0 # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
+<U2560> \xB1 |0 # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
+<U2561> \xB2 |0 # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
+<U0401> \xB3 |0 # CYRILLIC CAPITAL LETTER IO
+<U0404> \xB4 |0 # CYRILLIC CAPITAL LETTER UKRAINIAN IE
+<U2563> \xB5 |0 # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
+<U0406> \xB6 |0 # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0407> \xB7 |0 # CYRILLIC CAPITAL LETTER YI
+<U2566> \xB8 |0 # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
+<U2567> \xB9 |0 # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
+<U2568> \xBA |0 # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
+<U2569> \xBB |0 # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
+<U256A> \xBC |0 # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
+<U0490> \xBD |0 # CYRILLIC CAPITAL LETTER GHE WITH UPTURN
+<U256C> \xBE |0 # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
+<U00A9> \xBF |0 # COPYRIGHT SIGN
+<U044E> \xC0 |0 # CYRILLIC SMALL LETTER YU
+<U0430> \xC1 |0 # CYRILLIC SMALL LETTER A
+<U0431> \xC2 |0 # CYRILLIC SMALL LETTER BE
+<U0446> \xC3 |0 # CYRILLIC SMALL LETTER TSE
+<U0434> \xC4 |0 # CYRILLIC SMALL LETTER DE
+<U0435> \xC5 |0 # CYRILLIC SMALL LETTER IE
+<U0444> \xC6 |0 # CYRILLIC SMALL LETTER EF
+<U0433> \xC7 |0 # CYRILLIC SMALL LETTER GHE
+<U0445> \xC8 |0 # CYRILLIC SMALL LETTER HA
+<U0438> \xC9 |0 # CYRILLIC SMALL LETTER I
+<U0439> \xCA |0 # CYRILLIC SMALL LETTER SHORT I
+<U043A> \xCB |0 # CYRILLIC SMALL LETTER KA
+<U043B> \xCC |0 # CYRILLIC SMALL LETTER EL
+<U043C> \xCD |0 # CYRILLIC SMALL LETTER EM
+<U043D> \xCE |0 # CYRILLIC SMALL LETTER EN
+<U043E> \xCF |0 # CYRILLIC SMALL LETTER O
+<U043F> \xD0 |0 # CYRILLIC SMALL LETTER PE
+<U044F> \xD1 |0 # CYRILLIC SMALL LETTER YA
+<U0440> \xD2 |0 # CYRILLIC SMALL LETTER ER
+<U0441> \xD3 |0 # CYRILLIC SMALL LETTER ES
+<U0442> \xD4 |0 # CYRILLIC SMALL LETTER TE
+<U0443> \xD5 |0 # CYRILLIC SMALL LETTER U
+<U0436> \xD6 |0 # CYRILLIC SMALL LETTER ZHE
+<U0432> \xD7 |0 # CYRILLIC SMALL LETTER VE
+<U044C> \xD8 |0 # CYRILLIC SMALL LETTER SOFT SIGN
+<U044B> \xD9 |0 # CYRILLIC SMALL LETTER YERU
+<U0437> \xDA |0 # CYRILLIC SMALL LETTER ZE
+<U0448> \xDB |0 # CYRILLIC SMALL LETTER SHA
+<U044D> \xDC |0 # CYRILLIC SMALL LETTER E
+<U0449> \xDD |0 # CYRILLIC SMALL LETTER SHCHA
+<U0447> \xDE |0 # CYRILLIC SMALL LETTER CHE
+<U044A> \xDF |0 # CYRILLIC SMALL LETTER HARD SIGN
+<U042E> \xE0 |0 # CYRILLIC CAPITAL LETTER YU
+<U0410> \xE1 |0 # CYRILLIC CAPITAL LETTER A
+<U0411> \xE2 |0 # CYRILLIC CAPITAL LETTER BE
+<U0426> \xE3 |0 # CYRILLIC CAPITAL LETTER TSE
+<U0414> \xE4 |0 # CYRILLIC CAPITAL LETTER DE
+<U0415> \xE5 |0 # CYRILLIC CAPITAL LETTER IE
+<U0424> \xE6 |0 # CYRILLIC CAPITAL LETTER EF
+<U0413> \xE7 |0 # CYRILLIC CAPITAL LETTER GHE
+<U0425> \xE8 |0 # CYRILLIC CAPITAL LETTER HA
+<U0418> \xE9 |0 # CYRILLIC CAPITAL LETTER I
+<U0419> \xEA |0 # CYRILLIC CAPITAL LETTER SHORT I
+<U041A> \xEB |0 # CYRILLIC CAPITAL LETTER KA
+<U041B> \xEC |0 # CYRILLIC CAPITAL LETTER EL
+<U041C> \xED |0 # CYRILLIC CAPITAL LETTER EM
+<U041D> \xEE |0 # CYRILLIC CAPITAL LETTER EN
+<U041E> \xEF |0 # CYRILLIC CAPITAL LETTER O
+<U041F> \xF0 |0 # CYRILLIC CAPITAL LETTER PE
+<U042F> \xF1 |0 # CYRILLIC CAPITAL LETTER YA
+<U0420> \xF2 |0 # CYRILLIC CAPITAL LETTER ER
+<U0421> \xF3 |0 # CYRILLIC CAPITAL LETTER ES
+<U0422> \xF4 |0 # CYRILLIC CAPITAL LETTER TE
+<U0423> \xF5 |0 # CYRILLIC CAPITAL LETTER U
+<U0416> \xF6 |0 # CYRILLIC CAPITAL LETTER ZHE
+<U0412> \xF7 |0 # CYRILLIC CAPITAL LETTER VE
+<U042C> \xF8 |0 # CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042B> \xF9 |0 # CYRILLIC CAPITAL LETTER YERU
+<U0417> \xFA |0 # CYRILLIC CAPITAL LETTER ZE
+<U0428> \xFB |0 # CYRILLIC CAPITAL LETTER SHA
+<U042D> \xFC |0 # CYRILLIC CAPITAL LETTER E
+<U0429> \xFD |0 # CYRILLIC CAPITAL LETTER SHCHA
+<U0427> \xFE |0 # CYRILLIC CAPITAL LETTER CHE
+<U042A> \xFF |0 # CYRILLIC CAPITAL LETTER HARD SIGN
+END CHARMAP
--- source/data/mappings/ucmlocal.mk.orig	Thu Sep 16 13:15:05 2004
+++ source/data/mappings/ucmlocal.mk	Wed Sep 15 18:28:51 2004
@@ -0,0 +1 @@
+UCM_SOURCE_LOCAL = koi8-u.ucm
--- source/data/mappings/convrtrs.txt.orig	Wed Sep 15 18:10:20 2004
+++ source/data/mappings/convrtrs.txt	Thu Sep 16 12:23:59 2004
@@ -646,6 +646,7 @@
 ibm-868_P100-1995 { UTR22* }    ibm-868 { IBM* } IBM868 { IANA* JAVA } CP868 { IANA MIME* JAVA* } 868 { JAVA } csIBM868 { IANA } cp-ar { IANA }          # PC Urdu
 ibm-869_P100-1995 { UTR22* }    ibm-869 { IBM* } IBM869 { IANA* WINDOWS JAVA } cp869 { IANA MIME* JAVA* } 869 { IANA JAVA } cp-gr { IANA JAVA } csIBM869 { IANA JAVA } windows-869 { WINDOWS* } # PC Greek (w/o euro update)
 ibm-878_P100-1996 { UTR22* }    ibm-878 { IBM* } KOI8-R { IANA* MIME* JAVA* } koi8 { JAVA } csKOI8R { IANA JAVA } cp878   # Russian internet
+koi8-u { MIME* JAVA* }          KOI8-RU { MIME JAVA } # Ukrainian KOI RFC2319
 ibm-901_P100-1999 { UTR22* }    ibm-901 { IBM* } # PC Baltic (w/ euro update), update of ibm-921
 ibm-902_P100-1999 { UTR22* }    ibm-902 { IBM* } # PC Estonian (w/ euro update), update of ibm-922
 ibm-922_P100-1999 { UTR22* }    ibm-922 { IBM* JAVA }   cp922 { MIME* JAVA* } 922 { JAVA } # PC Estonian (w/o euro update)
@@ -865,4 +866,4 @@
 #ibm-955                 jis-208 jisx-208    # Pure DBCS jisx-208
 
 #ibm-1159_P100-1999 { UTR22* }   ibm-1159 { IBM* }   # SBCS T-Ch Host. Euro update of ibm-28709. This is used in combination with another CCSID mapping.
-#ibm-9027_P100-1999 { UTR22* }   ibm-9027 { IBM* }   # DBCS T-Ch Host. Euro update of ibm-835. DBCS portion of ibm-1371.
\ No newline at end of file
+#ibm-9027_P100-1999 { UTR22* }   ibm-9027 { IBM* }   # DBCS T-Ch Host. Euro update of ibm-835. DBCS portion of ibm-1371.
--- patch-koi8-u ends here ---


>Release-Note:
>Audit-Trail:
>Unformatted:



More information about the freebsd-ports-bugs mailing list