[Bug 275444] isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this)
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 275444] isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 275444] isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 275444] isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 275444] isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 275444] isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 275444] isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 275444] isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 30 Nov 2023 04:56:15 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275444
Bug ID: 275444
Summary: isprint() library function returns wrong when LC_CTYPE
is ja_JP.SJIS (tcsh aborts by this)
Product: Base System
Version: 14.0-RELEASE
Hardware: Any
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: misc
Assignee: bugs@FreeBSD.org
Reporter: uratan@miomio.jp
Attachment #246681 text/plain
mime type:
Created attachment 246681
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=246681&action=edit
a test C code to confirm the problem
(I found same problem reported in bug #264299 just now, but I will report)
When LC_CTYPE is ja_JP.SJIS, isprint() library function returns
both 0 (not printable) for the char '\'(0x5c) and '~'(0x7e).
(Maybe) because of this problem, tcsh/csh will abort with core
by issuing simple "printenv" command.
See the result of the test below.
(the test C code is attached to this report)
+-----------------------------------------------------
|% setenv LC_CTYPE ja_JP.eucJP
|% ./z-test-isprint
|isalnum('3') is 1
|isalnum('B') is 1
|isalnum('\') is 0
|isalnum('~') is 0
| isascii('3') is 1
| isascii('B') is 1
| isascii('\') is 1
| isascii('~') is 1
|isprint('3') is 1
|isprint('B') is 1
|isprint('\') is 1 <===
|isprint('~') is 1 <===
|% csh
|% printenv
-- omitted --
|EDITOR=vim
|LC_CTYPE=ja_JP.eucJP
|EXINIT=source ~/.exrc
|PAGER=jless
|% exit
|exit
|
|% setenv LC_CTYPE ja_JP.SJIS
|% ./z-test-isprint
|isalnum('3') is 1
|isalnum('B') is 1
|isalnum('\') is 0
|isalnum('~') is 0
| isascii('3') is 1
| isascii('B') is 1
| isascii('\') is 1
| isascii('~') is 1
|isprint('3') is 1
|isprint('B') is 1
|isprint('\') is 0 <===
|isprint('~') is 0 <===
|% csh
|% printenv
-- omitted --
|EDITOR=vim
|LC_CTYPE=ja_JP.SJIS
|Segmentation fault (core dumped) <===
|% ls
|csh.core z-test-isprint*
|typescript z-test-isprint.c
+-----------------------------------------------------
- * - * -
From here, I will describe the mechanism, using these files by short-name.
/usr/share/locale/ja_JP.eucJP/LC_CTYPE --> F1.eucJP/LC_CTYPE
/usr/share/locale/ja_JP.SJIS/LC_CTYPE --> F2.SJIS/LC_CTYPE
/usr/src/share/ctypedef/ja_JP.eucJP.src --> F3.ja_JP.eucJP.src
/usr/src/tools/tools/locale/etc/final-maps/map.eucJP --> F4.map.eucJP
/usr/src/tools/tools/locale/etc/final-maps/map.SJIS --> F5.map.SJIS
/usr/src/tools/tools/locale/etc/final-maps/widths.txt --> F6.widths.txt
In /usr/src/share/ctypedef/,
F1.eucJP/LC_CTYPE is made from:
F6.widths.txt, F4.map.eucJP, F3.ja_JP.eucJP.src
and F2.SJIS/LC_CTYPE is:
F6.widths.txt, F5.map.SJIS, F3.ja_JP.eucJP.src
F4.map.eucJP has these mapping for the subjected characters:
+-------------------------------
|<REVERSE_SOLIDUS> \x5c
|<TILDE> \x7e
+-------------------------------
and F5.map.SJIS has these:
+-------------------------------
|<YEN_SIGN> \x5c
|<OVERLINE> \x7e
+-------------------------------
(other chars under \x7f have same names)
F3.ja_JP.eucJP.src is diverted to making both LC_CTYPE file,
it has <REVERSE_SOLIDUS> and <TILDE> in misc section (including 'print')
but not have <YEN_SIGN> or <OVERLINE> in any section,
see summary of F3.ja_JP.eucJP.src below.
+-------------------------------------------
1 |# Warning: Do not edit. This file is automatically extracted from the
2 |# tools in /usr/src/tools/tools/locale. The data is obtained from the
3 |# CLDR project, obtained from http://cldr.unicode.org/
4 |#
-----------------------------------------------------------------------------
5 |comment_char *
6 |escape_char /
7 |LC_CTYPE
8 |*************
9 |
10 |upper <A>;/
11 | <B>;/
240 |lower <a>;/
478 |alpha <CARON>;/
12872 |space <tab>;/
12880 |cntrl <NULL>;/
12914 |graph <EXCLAMATION_MARK>;/
12932 | <three>;/
12947 | <B>;/
12973 | <REVERSE_SOLIDUS>;/ <===
13007 | <TILDE>;/ <===
26017 |print <space>;/
26036 | <three>;/
26051 | <B>;/
26077 | <REVERSE_SOLIDUS>;/ <===
26111 | <TILDE>;/ <===
39122 |punct <EXCLAMATION_MARK>;/
39140 | <REVERSE_SOLIDUS>;/ <===
39207 |digit <zero>;/
39210 | <three>;/
39218 |xdigit <zero>;/
39221 | <three>;/
39229 | <B>;/
39241 |blank <tab>;/
39245 |toupper (<a>,<A>);/
39246 | (<b>,<B>);/
39474 |tolower (<A>,<a>);/
39475 | (<B>,<b>);/
39703 |END LC_CTYPE
+-------------------------------------------
So char '\' and '~' are not classified to printable in F2.SJIS/LC_CTYPE,
isprint() with LC_CTYPE=ja_JP.SJIS reports wrong in result, I think.
- * - * -
WORK AROUND by quick-hack
Make new F2.SJIS/LC_CTYPE by renaming the char names in F5.map.SJIS
like below.
<YEN_SIGN> --> <REVERSE_SOLIDUS>
<OVERLINE> --> <TILDE>
WORK AROUND by proper
Get proper file ja_JP.SJIS.src from somewhere and use it
for making F2.SJIS/LC_CTYPE.
- * - * -
Confirming from tcsh side
I confirmed this problem from the tcsh side, see the code below.
It is the function xputchar() in /usr/src/contrib/tcsh/sh.print.c.
From line 167, xputchar() will output un-printable char as "\nnn" format.
+-----------------------------------------------------
144 |void
145 |xputchar(int c)
146 |{
147 | int atr;
148 |
149 | atr = c & ATTRIBUTES & TRIM;
150 | c &= CHAR | QUOTE;
151 | if (!output_raw && (c & QUOTE) == 0) {
152 | if (iscntrl(c) && (ASC(c) < 0x80 || MB_CUR_MAX == 1)) {
153 | if (c != '\t' && c != '\n'
154 |#ifdef COLORCAT
155 | && !(adrof(STRcolorcat) && c == CTL_ESC('\033'))
156 |#endif
157 | && (xlate_cr || c != '\r'))
158 | {
159 | xputchar('^' | atr);
160 | if (c == CTL_ESC('\177'))
161 | c = '?';
162 | else
163 | /* Note: for IS_ASCII, this compiles to: c = c | 0100
*/
164 | c = CTL_ESC(ASC(c)|0100);
165 | }
166 | }
167 | else if (!isprint(c) && (ASC(c) < 0x80 || MB_CUR_MAX == 1)) {
168 | xputchar('\\' | atr);
169 | xputchar((((c >> 6) & 7) + '0') | atr);
170 | xputchar((((c >> 3) & 7) + '0') | atr);
171 | c = (c & 7) + '0';
172 | }
173 | (void) putraw(c | atr);
174 | }
175 | else {
176 | c &= TRIM;
177 | if (haderr ? (didfds ? is2atty : isdiagatty) :
178 | (didfds ? is1atty : isoutatty))
179 | SetAttributes(c | atr);
180 | (void) putpure(c);
181 | }
182 | if (lbuffed && (c & CHAR) == '\n')
183 | flush();
184 |}
+-----------------------------------------------------
The trigger was the '~' char in my environment variable EXINIT,
it is detected un-printable (wrongly).
So the xputchar() put first '\' by calling himself recursively,
then, in the child call, the '\' char is detected also un-printable
wrongly, so he calls himself once more, more, more...
At the result, an infinite-recursive-function-call is established
and loops until the stack overflow.
The '\' char should never be un-printable for xputchar().
- * - * -
p.s.
I am satisfied enough by my quick-hack now...
It seems that the scope of the char names like <TILDE> are closed
within F3.ja_JP.eucJP.src, F4.map.eucJP and F5.map.SJIS,
and seems that the char names are not included in F1.eucJP/LC_CTYPE
and F2.SJIS/LC_CTYPE,
so my quick-hack may be the complete solution if it is true...
Also because it is very natural, I think, that isXXXX() functions
with ja_JP.ANY environment return same result for char code
from 0x00 to 0x7f regardless of LC_CTYPE configuration...
(regardless of the apperarance of the char/font)
--
You are receiving this mail because:
You are the assignee for the bug.