misc/100212: UTF-8 zero-width character patch
J.R. Oldroyd
fbsd at opal.com
Thu Jul 13 15:00:35 UTC 2006
>Number: 100212
>Category: misc
>Synopsis: UTF-8 zero-width character patch
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: change-request
>Submitter-Id: current-users
>Arrival-Date: Thu Jul 13 15:00:30 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator: J.R. Oldroyd
>Release: FreeBSD 6.1-STABLE i386
>Organization:
>Environment:
System: FreeBSD linwhf.opal.com 6.1-STABLE FreeBSD 6.1-STABLE #1: Thu May 18 16:03:24 EDT 2006 xxx at linwhf.opal.com:/usr/obj/usr/src/sys/LINWHF i386
>Description:
This patch makes the so-called zero-width, non-spacing, or
overstriking characters of the UTF-8 encoding exactly that.
At the present time, these characters are coded with a width
of 1 which is wrong. They should have a width of 0.
>How-To-Repeat:
Save this file:
http://opal.com/freebsd/unicode/utf8demo.txt
On an xterm, cat the file and examine the "Combining characters"
and the "Thai (UCS Level 2)" sections. Without the patch, the
non-spacing characters do not overstrike the previous character.
With the patch, they do.
This patch has been posted to -current and downloaded and
reviewed many times following that posting:
http://lists.freebsd.org/pipermail/freebsd-current/2006-June/064218.html
>Fix:
--- /usr/src/share/mklocale/UTF-8.src.orig Sat Mar 27 03:14:14 2004
+++ /usr/src/share/mklocale/UTF-8.src Mon Jun 26 23:15:34 2006
@@ -487,9 +487,9 @@
* U+0300 - U+036F : Combining Diacritical Marks
*/
-GRAPH 0x0300 - 0x034f 0x0360 - 0x036f
-PRINT 0x0300 - 0x034f 0x0360 - 0x036f
-SWIDTH1 0x0300 - 0x034f 0x0360 - 0x036f
+GRAPH 0x0300 - 0x036f
+PRINT 0x0300 - 0x036f
+SWIDTH0 0x0300 - 0x036f
MAPUPPER < 0x0345 0x0399 >
@@ -593,7 +593,8 @@
UPPER 0x04e2 0x04e4 0x04e6 0x04e8 0x04ea 0x04ec 0x04ee
UPPER 0x04f0 0x04f2 0x04f4 0x04f8
PRINT 0x0400 - 0x0486 0x0488 - 0x04ce 0x04d0 - 0x04f5 0x04f8 0x04f9
-SWIDTH1 0x0400 - 0x0486 0x0488 - 0x04ce 0x04d0 - 0x04f5 0x04f8 0x04f9
+SWIDTH1 0x0400 - 0x0482 0x048a - 0x04ce 0x04d0 - 0x04f5 0x04f8 0x04f9
+SWIDTH0 0x0483 - 0x0486 0x0488 - 0x0489
MAPUPPER < 0x0430 - 0x044f : 0x0410 >
MAPUPPER < 0x0450 - 0x045f : 0x0400 >
@@ -1016,7 +1017,8 @@
GRAPH 0x0e01 - 0x0e3a 0x0e3f - 0x0e5b
PUNCT 0x0e3f 0x0e4f 0x0e5a 0x0e5b
PRINT 0x0e01 - 0x0e3a 0x0e3f - 0x0e5b
-SWIDTH1 0x0e01 - 0x0e3a 0x0e3f - 0x0e5b
+SWIDTH0 0x0e31 0x0e34 - 0x0e3a 0x0e47 - 0x0e4e
+SWIDTH1 0x0e01 - 0x0e30 0x0e32 - 0x0e33 0x0e3f - 0x0e46 0x0e4f - 0x0e5b
/*
@@ -1647,9 +1649,9 @@
* U+20D0 - U+20FF : Combining Diacritical Marks for Symbols
*/
-GRAPH 0x20d0 - 0x20ea
-PRINT 0x20d0 - 0x20ea
-SWIDTH1 0x20d0 - 0x20ea
+GRAPH 0x20d0 - 0x20ff
+PRINT 0x20d0 - 0x20ff
+SWIDTH0 0x20d0 - 0x20ff
/*
@@ -1927,7 +1929,8 @@
PUNCT 0x309b 0x309c
PRINT 0x3041 - 0x3096 0x3099 - 0x309f
PHONOGRAM 0x3041 - 0x3096 0x309f
-SWIDTH2 0x3041 - 0x3096 0x3099 - 0x309f
+SWIDTH2 0x3041 - 0x3096 0x309b - 0x309f
+SWIDTH0 0x3099 - 0x309a
/*
@@ -2149,9 +2152,9 @@
* U+FE20 - U+FE2F : Combining Half Marks
*/
-GRAPH 0xfe20 - 0xfe23
-PRINT 0xfe20 - 0xfe23
-SWIDTH1 0xfe20 - 0xfe23
+GRAPH 0xfe20 - 0xfe2f
+PRINT 0xfe20 - 0xfe2f
+SWIDTH0 0xfe20 - 0xfe2f
/*
@@ -2272,7 +2275,8 @@
PUNCT 0x1d100 - 0x1d126 0x1d12a - 0x1d164 0x1d16a - 0x1d16c
PUNCT 0x1d183 0x1d184 0x1d18c - 0x1d1a9 0x1d1ae - 0x1d1dd
PRINT 0x1d100 - 0x1d126 0x1d12a - 0x1d172 0x1d17b - 0x1d1dd
-SWIDTH1 0x1d100 - 0x1d126 0x1d12a - 0x1d172 0x1d17b - 0x1d1dd
+SWIDTH1 0x1d100 - 0x1d126 0x1d12a - 0x1d164 0x1d16a - 0x1d172 0x1d183 0x1d184 0x1d18c - 0x1d1a9 0x1d1ae - 0x1d1dd
+SWIDTH0 0x1d165 - 0x1d169 0x1d17b - 0x1d182 0x1d185 - 0x1d18b 0x1d1aa - 0x1d1ad
/*
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list