Re: vt newcons mouse paste issue FIXED

From: Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp>
Date: Fri, 24 Jun 2022 16:51:50 UTC
On Fri, 24 Jun 2022 17:29:26 +0200
Hans Petter Selasky <hps@selasky.org> wrote:

> Hi Tomoaki,
> 
> On 6/24/22 16:48, Hans Petter Selasky wrote:
> > IDEOGRAPHIC (Full-width) SPACE
> 
> According to this page:
> 
> https://jkorpela.fi/chars/spaces.html
> 
> There are multiple uni-code characters which are spaces. Should we 
> support them all?
> 
> --HPS

Nice page!

Maybe not all. My guess based on "Sample" and "Width of the character"
fields are as below. At the first column,

'Y': Should be treated as space / word separator
'N': Should NOT be treated as space / word separator
'U': Unknown for me. Need native speaker to determine. 

Maybe someone have objections, but basically I've considered breakable
spaces as space characters. See also URL [1] below.

  Special cases:
    *Looking sample, U+1680 is shown as dash so considered 'N'.
    *Treated "QUAD" as just a graphical (non-semantic) use so
     considered as 'N'.
    *Considered U+205F as 'N', as I thought, for mathematical usage,
     unintended line break could cause fatal confusion.


  Code   Name of the character
Y U+0020 SPACE
N U+00A0 NO-BREAK SPACE
N U+1680 OGHAM SPACE MARK
Y U+180E MONGOLIAN VOWEL SEPARATOR
N U+2000 EN QUAD
N U+2001 EM QUAD
Y U+2002 EN SPACE (nut)
Y U+2003 EM SPACE (mutton)
Y U+2004 THREE-PER-EM SPACE (thick space)
Y U+2005 FOUR-PER-EM SPACE (mid space)
Y U+2006 SIX-PER-EM SPACE
N U+2007 FIGURE SPACE
Y U+2008 PUNCTUATION SPACE
Y U+2009 THIN SPACE
Y U+200A HAIR SPACE
Y U+200B ZERO WIDTH SPACE
N U+202F NARROW NO-BREAK SPACE
N U+205F MEDIUM MATHEMATICAL SPACE
Y U+3000 IDEOGRAPHIC SPACE
N U+FEFF ZERO WIDTH NO-BREAK SPACE

Maybe, the best would be looking into how unicode normalization treat
them. But we Japanese would want U+3000 treated as space.


[1] https://en.wikipedia.org/wiki/Non-breaking_space

-- 
Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>