From nobody Mon Jun 27 08:18:08 2022 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id BD3758732AD; Mon, 27 Jun 2022 08:18:08 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4LWgb43yN3z4dB8; Mon, 27 Jun 2022 08:18:08 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1656317888; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=RRaZvZF+T6T1HSA88jfmrp+ilUVsruyHY5xRfRkZygI=; b=n8R/WEusrb8Yx3R5cca2saFc6y4ZtFl0q+czzYTBxcrSA31OT9bH7M8gi21ciIqqKyntH2 jMBG03Mz/2sNic2+Qgd1/j3rnG2G3AJPfHyV2NXEMhmlqlAR2WT+8J5MeLQ14zfkSRy036 /uhPphBE5xtRMXd0ngrxZqoRs1qvGLcwmERGQl/Tf4bjRVRsd/LYgjrz9JMTl5FhQfBw+8 QHJwQVn103/oQzBsJXZwnaCjGGS+pT9D8O7hSam81IRIoGC2KRma52Qb7q8g9/synwVdWK 9ASW7+CS0QfIBpaY7rtZaH9ttcO5jg08DuUTmkhCP92LJ17/TjM9z52qMVJLqQ== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 61D722486F; Mon, 27 Jun 2022 08:18:08 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 25R8I881093384; Mon, 27 Jun 2022 08:18:08 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 25R8I8dh093383; Mon, 27 Jun 2022 08:18:08 GMT (envelope-from git) Date: Mon, 27 Jun 2022 08:18:08 GMT Message-Id: <202206270818.25R8I8dh093383@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Hans Petter Selasky Subject: git: 9971e6aff1be - main - vt: Improve multi lingual word separation. List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: hselasky X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 9971e6aff1bef3d456172c41a3df3ce7266517cf Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1656317888; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=RRaZvZF+T6T1HSA88jfmrp+ilUVsruyHY5xRfRkZygI=; b=Knl/1mUvqlElqpgLurgAyZeqvHmzDvMIn5vBud+ZFHjgxVE6NeXAQas1uZ9CTGdPXY/YSD wByRtSuvGl7cniOia2BpOrxAJ/QoKCzjWmNQ3L24awBotVkrutgVW1rlG3fDeUVd6EKNV6 pYM40C54ZiAkuz144dzJvm1raQgw07UUObxkndLHTgw3fHf3aMwsU/hCXG6GGNsk6tyPvi aSt97pbtDGk05WenX3Sf749f76WIi86JSBDLDDnkLF6IfmLcivuNx/8s/qjZp+cG8C6KoD SvTmZaRcAsEutrveOCe/Wr80w+bX6ZgRVdviLTqwZL6WWC+0V00D8iumkBToQw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1656317888; a=rsa-sha256; cv=none; b=wiejqnm92TsQvnuuhrQEM+6p+dL8c8DQCoGKz2BHkO2saTzdJ0NlkYuusE6DQ7CrkpTNfT +/+ifRANcAG6tbnVUm2ZIeBoz0v+R0WfDVP2w+xXvPRllBVAi/rdVIJSCHoP6LVc+zkzRQ Lk4zA9hxmcO9QkW3sfyKGDV0mQ1sMapjU4YCJjLLu3U55GAZgrydjahqHiBIJ6BNAc7u6A YixPSJbcC2R5wZ7sIaswyhqtu0IwpXGdG7LhrroVhCd5ZCxM0d3WHyGp01yKC7mmNTRSw7 hOvEjiJB7yknFiS28Ow71hiDB01t54HYFsBbrgH7eL/4OYh+3Sp3d9HJjOarjA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by hselasky: URL: https://cgit.FreeBSD.org/src/commit/?id=9971e6aff1bef3d456172c41a3df3ce7266517cf commit 9971e6aff1bef3d456172c41a3df3ce7266517cf Author: Hans Petter Selasky AuthorDate: 2022-06-25 09:17:44 +0000 Commit: Hans Petter Selasky CommitDate: 2022-06-27 08:17:16 +0000 vt: Improve multi lingual word separation. Suggested by: Tomoaki AOKI Differential Revision: https://reviews.freebsd.org/D35552 PR: 263084 MFC after: 1 week Sponsored by: NVIDIA Networking --- sys/dev/vt/vt_buf.c | 29 ++++++++++++++++++++++++++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/sys/dev/vt/vt_buf.c b/sys/dev/vt/vt_buf.c index fa6c7c8fec5f..b83db85f1cdb 100644 --- a/sys/dev/vt/vt_buf.c +++ b/sys/dev/vt/vt_buf.c @@ -747,6 +747,29 @@ vtbuf_get_marked_len(struct vt_buf *vb) return (sz * sizeof(term_char_t)); } +static bool +tchar_is_word_separator(term_char_t ch) +{ + /* List of unicode word separator characters: */ + switch (TCHAR_CHARACTER(ch)) { + case 0x0020: /* SPACE */ + case 0x180E: /* MONGOLIAN VOWEL SEPARATOR */ + case 0x2002: /* EN SPACE (nut) */ + case 0x2003: /* EM SPACE (mutton) */ + case 0x2004: /* THREE-PER-EM SPACE (thick space) */ + case 0x2005: /* FOUR-PER-EM SPACE (mid space) */ + case 0x2006: /* SIX-PER-EM SPACE */ + case 0x2008: /* PUNCTUATION SPACE */ + case 0x2009: /* THIN SPACE */ + case 0x200A: /* HAIR SPACE */ + case 0x200B: /* ZERO WIDTH SPACE */ + case 0x3000: /* IDEOGRAPHIC SPACE */ + return (true); + default: + return (false); + } +} + void vtbuf_extract_marked(struct vt_buf *vb, term_char_t *buf, int sz) { @@ -779,7 +802,7 @@ vtbuf_extract_marked(struct vt_buf *vb, term_char_t *buf, int sz) if (r != e.tp_row) { /* Trim trailing word separators, if any. */ for (; i != j; i--) { - if (TCHAR_CHARACTER(buf[i - 1]) != ' ') + if (!tchar_is_word_separator(buf[i - 1])) break; } /* Add newline character as expected by TTY. */ @@ -824,7 +847,7 @@ vtbuf_set_mark(struct vt_buf *vb, int type, int col, int row) vtbuf_wth(vb, row); r = vb->vb_rows[vb->vb_mark_start.tp_row]; for (i = col; i >= 0; i --) { - if (TCHAR_CHARACTER(r[i]) == ' ') { + if (tchar_is_word_separator(r[i])) { vb->vb_mark_start.tp_col = i + 1; break; } @@ -833,7 +856,7 @@ vtbuf_set_mark(struct vt_buf *vb, int type, int col, int row) if (i == -1) vb->vb_mark_start.tp_col = 0; for (i = col; i < vb->vb_scr_size.tp_col; i++) { - if (TCHAR_CHARACTER(r[i]) == ' ') { + if (tchar_is_word_separator(r[i])) { vb->vb_mark_end.tp_col = i; break; }