bin/164317: [patch] sbin/write: add multibyte character support
Dmitry Marakasov
amdmi3 at FreeBSD.org
Thu Jan 19 20:20:09 UTC 2012
>Number: 164317
>Category: bin
>Synopsis: [patch] sbin/write: add multibyte character support
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: change-request
>Submitter-Id: current-users
>Arrival-Date: Thu Jan 19 20:20:08 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator: Dmitry Marakasov
>Release: FreeBSD 9.0-RC2 amd64
>Organization:
>Environment:
System: FreeBSD hades.panopticon 9.0-RC2 FreeBSD 9.0-RC2 #0: Tue Nov 29 07:18:03 MSK 2011 root at hades.panopticon:/usr/work/usr/src/sys/HADES amd64
>Description:
Currently write(1) doesn't handle utf8 locale at all:
(this is Russian)
# echo "ÐÑовеÑка" | write amdmi3 pts/29
Message from amdmi3 at hades.panopticon on pts/29 at 23:31 ...
M-PM-^_M-QM-^@M-PM->M-PM-2M-PM-5M-QM-^@M-PM-:M-PM-0
EOF
checks used in character printing routine (((*s & 0x80) && *s < 0xA0)) seem to assume specific encoding (for example, CP866 has letters in 0x80-0xA0), so this will not work correctly for even 8 bit locales.
The utility is easily convertable to wchar_t however, which should handle them all, and the patch for it is attached.
% (echo "ÐÑовеÑка"; echo "Some control characters: \b\t^[[D^[[C^[[A^[[B^[") | ./write amdmi3 pts/29
Message from amdmi3 at hades.panopticon on pts/29 at 23:43 ...
ÐÑовеÑка
Some control characters: <0x8> <0x1B>[D<0x1B>[C<0x1B>[A<0x1B>[B<0x1B>
EOF
The way of displaying non-printable characters is discussable, but since one can neither assume that locale is UTF nor that that wchar_t value is somehow linked to codepoint, it would be inappropriate to use notations like U+%X or \u%X or &#%d; and/or modify wchar_t with bitwise operations. Notation like <0x%X> however is charset-agnostic and pretty readable, so I think it's quite suitable here.
>How-To-Repeat:
>Fix:
Index: write.1
===================================================================
--- write.1 (revision 230334)
+++ write.1 (working copy)
@@ -107,7 +107,3 @@
terminal, not the receiver's (which
.Nm
has no way of knowing).
-.Pp
-The
-.Nm
-utility does not recognize multibyte characters.
Index: write.c
===================================================================
--- write.c (revision 230334)
+++ write.c (working copy)
@@ -60,12 +60,14 @@
#include <string.h>
#include <unistd.h>
#include <utmpx.h>
+#include <wchar.h>
+#include <wctype.h>
void done(int);
void do_write(char *, char *, uid_t);
static void usage(void);
int term_chk(char *, int *, time_t *, int);
-void wr_fputs(unsigned char *s);
+void wr_fputs(wchar_t *s);
void search_utmp(char *, char *, char *, uid_t);
int utmp_chk(char *, char *);
@@ -243,7 +245,8 @@
char *nows;
struct passwd *pwd;
time_t now;
- char path[MAXPATHLEN], host[MAXHOSTNAMELEN], line[512];
+ char path[MAXPATHLEN], host[MAXHOSTNAMELEN];
+ wchar_t line[512];
/* Determine our login name before we reopen() stdout */
if ((login = getlogin()) == NULL) {
@@ -269,7 +272,7 @@
(void)printf("\r\n\007\007\007Message from %s@%s on %s at %s ...\r\n",
login, host, mytty, nows + 11);
- while (fgets(line, sizeof(line), stdin) != NULL)
+ while (fgetws(line, sizeof(line)/sizeof(wchar_t), stdin) != NULL)
wr_fputs(line);
}
@@ -288,30 +291,20 @@
* turns \n into \r\n
*/
void
-wr_fputs(unsigned char *s)
+wr_fputs(wchar_t *s)
{
-#define PUTC(c) if (putchar(c) == EOF) err(1, NULL);
+#define PUTC(c) if (putwchar(c) == WEOF) err(1, NULL);
- for (; *s != '\0'; ++s) {
- if (*s == '\n') {
- PUTC('\r');
- } else if (((*s & 0x80) && *s < 0xA0) ||
- /* disable upper controls */
- (!isprint(*s) && !isspace(*s) &&
- *s != '\a' && *s != '\b')
- ) {
- if (*s & 0x80) {
- *s &= ~0x80;
- PUTC('M');
- PUTC('-');
- }
- if (iscntrl(*s)) {
- *s ^= 0x40;
- PUTC('^');
- }
+ for (; *s != L'\0'; ++s) {
+ if (*s == L'\n') {
+ PUTC(L'\r');
+ PUTC(L'\n');
+ } else if (iswprint(*s) || iswspace(*s)) {
+ PUTC(*s);
+ } else {
+ wprintf(L"<0x%X>", *s);
}
- PUTC(*s);
}
return;
#undef PUTC
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list