From patchwork Tue Apr 15 19:17:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 881624 Received: from fhigh-a5-smtp.messagingengine.com (fhigh-a5-smtp.messagingengine.com [103.168.172.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECA9324503A; Tue, 15 Apr 2025 19:22:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744942; cv=none; b=bC0oxJewlAjOKY5qBjHRMyq7MPLv1qXAeHh79U4wuLoA8hJGzKHMcg22JAfkq+PP7aleCZkK1qIpMSGIYuGJ0/pifEuXIO14ITPWueKZ9BEDGoHJNdRTg7QjgL4Tl446gg04GIMZKEiuR28ALRvGbXTCXiT44JjI5I+8o9pHnMA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744942; c=relaxed/simple; bh=ft54p6bhTxDj+z0qJTAsmGd30EjVDhq/ynj+X5E4H/g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ciAIGGu+NP/s8fDN43wVT/rH9hkUIXEje+gRLLiVwN8bJPVpjF6Kg4plzmxxyxAEXZSOo6LhiUdgufSSGexq/FPyvPjs8WGKlNwN9FDYdwEHH/bV9ReQmfnmthhjYS7FYl8PjPRft//jaD/bu/beYKabc3mAY5Xw8WT8byXHdd8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=GSvR5vFq; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=Gygb56/N; arc=none smtp.client-ip=103.168.172.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="GSvR5vFq"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Gygb56/N" Received: from phl-compute-05.internal (phl-compute-05.phl.internal [10.202.2.45]) by mailfhigh.phl.internal (Postfix) with ESMTP id 0BADE11401BE; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-05.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744939; x= 1744831339; bh=zNG1YKUsmyDP/ojfVRvL6wcBsNKK7C86m9dSyKIqDh8=; b=G SvR5vFq7pJ6IVgUWVYyRCthVZYfp42xWizOdCXTZe4xJJFIHmt6mnh5nnJAY2CMM YNC/fdyCkzukzpEeNUMbJgOGx5yhYR1DeCG0xShc5oWZJ8E0yfXcsC0HJuj+tKqV CO/7EvzYtftM/bssMoNa+OL2pGiZYfg09gi84m8aTDHzm1eGWz/2vmtZ3N4vgtZZ 7g/TR3UBulotuf6onK2cLawtPTYNykuXdU7sD8AlS6DqENiaUDgvE5aOX3hfDbM+ 9tzVSDmfKDQm0nFYrjX9Tr5WhQiIjApU0vMsa9MwTqMN8Rlo3/yEBygdP3aNkpwn eYleefb2KNzsC8dxuhFbA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x=1744831339; bh=z NG1YKUsmyDP/ojfVRvL6wcBsNKK7C86m9dSyKIqDh8=; b=Gygb56/NugqO1PuGX mMAcaVLa5yPvQCURno3XhjB76khWPAFCIxhumsh9L6RIyIOMS6t1YQpsYwutx39J K+KSfgYgTIDKP8wfXOgdIWOWNDKd9y0PUkB/IM4pQgIBoMEviIsqAbXApegRjCd6 +L1rM3GmIIVHmlIq+2/KDilrcOcKtN88QyWxqBOa7i74keAYabeZHh+GLU1a8zu+ t7wN57Iw10uyd9BSs2PgNeieaKYsMDDpbxUL4Z5ux0XTZdHY/g27kRXweOL01wjp QLRYx/bNejWKpXpGlaR1sZnOHbABaBs1rPDX2vInhXRuPfHw/1Op+R4Mcp3G7VeI tRUOg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id DCEAC1116607; Tue, 15 Apr 2025 15:22:17 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 01/13] vt: minor cleanup to vc_translate_unicode() Date: Tue, 15 Apr 2025 15:17:50 -0400 Message-ID: <20250415192212.33949-2-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Make it clearer when a sequence is bad. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/vt.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index f5642b3038..b5f3c8a818 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2817,7 +2817,7 @@ static int vc_translate_unicode(struct vc_data *vc, int c, bool *rescan) if ((c & 0xc0) == 0x80) { /* Unexpected continuation byte? */ if (!vc->vc_utf_count) - return 0xfffd; + goto bad_sequence; vc->vc_utf_char = (vc->vc_utf_char << 6) | (c & 0x3f); vc->vc_npar++; @@ -2829,17 +2829,17 @@ static int vc_translate_unicode(struct vc_data *vc, int c, bool *rescan) /* Reject overlong sequences */ if (c <= utf8_length_changes[vc->vc_npar - 1] || c > utf8_length_changes[vc->vc_npar]) - return 0xfffd; + goto bad_sequence; return vc_sanitize_unicode(c); } /* Single ASCII byte or first byte of a sequence received */ if (vc->vc_utf_count) { - /* Continuation byte expected */ + /* A continuation byte was expected */ *rescan = true; vc->vc_utf_count = 0; - return 0xfffd; + goto bad_sequence; } /* Nothing to do if an ASCII byte was received */ @@ -2858,11 +2858,14 @@ static int vc_translate_unicode(struct vc_data *vc, int c, bool *rescan) vc->vc_utf_count = 3; vc->vc_utf_char = (c & 0x07); } else { - return 0xfffd; + goto bad_sequence; } need_more_bytes: return -1; + +bad_sequence: + return 0xfffd; } static int vc_translate(struct vc_data *vc, int *c, bool *rescan) From patchwork Tue Apr 15 19:17:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 881623 Received: from fout-a2-smtp.messagingengine.com (fout-a2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0DBF2459C9; Tue, 15 Apr 2025 19:22:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744942; cv=none; b=PmKYlEMtu+GKHmIY0MHO3GUYxMtMhnn9unpFvqJtjaBA/+ZiEdu3SnE+NGEpsCz9OeXMPKZNignurrLLLQXTOfaYBoJuHsCxM/q4mSnGbtincBF/WFdOlGgdmA4o9oGRqPTet+qYLIPVCtBuxlucsTG8fZ4wRZztM1P3FDDP8Qw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744942; c=relaxed/simple; bh=NyX9+XHAPpgGkQEpbtB7kEzTbTnlO8qkKoMp0MY6b08=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LxvIH8JmihPECj4IxcrQQ0D4FnNi2SyKYO5U06yu9zUEXJ8ll+U19Fp8KK5bmZXuQHNnyvd/EH52TfvhRslJF19gQP4G8gxRcJGQXyTxlQt/DglSJS7OcIxrRq3qW2znaR8NlMPNfvyqo5S3t7CxmSLAn7FI2F2w4hQfgxD1ONs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=bO7+uT8E; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=HxAUNMTf; arc=none smtp.client-ip=103.168.172.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="bO7+uT8E"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="HxAUNMTf" Received: from phl-compute-04.internal (phl-compute-04.phl.internal [10.202.2.44]) by mailfout.phl.internal (Postfix) with ESMTP id 0D9C013801DE; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-04.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744939; x= 1744831339; bh=FpoEp//AvN/ZsJcRm4eIP7BGc+NywZPXBlR30kcYKCM=; b=b O7+uT8ElnC8vSrRRbw6IVwyAfg7rLSwmmmYgebXAGCcb6V784TH8ToG6exDdCz5V zIgRQeXxB1c0utGovMlPJzrcjPJWjeHOJoY8IX0DtLE6QQ7PHIbVt8L98UmcCzc+ Dx9hN5Zk2tCmzct4ZQZdqdxpKFLLIWoFMGNKC5pDQAhsZT8wqJo0VY6ZISN8YZIl 6hmbd3XRudfxkmYVebYNIiig3AEMpfAy0C9ZrtkmxIKdh0TBqEKOSKOVyDZo00HI 8S3nu6KLhik5uATx5QJnCcKXLR0CpbohxyGzVuPuB5VTD3VW5EnDrXRE14nkRmbu w7rFYpfOfMvVqWTEBcegQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x=1744831339; bh=F poEp//AvN/ZsJcRm4eIP7BGc+NywZPXBlR30kcYKCM=; b=HxAUNMTf0owBaaRKm aPiqPHyn8Vi2jnvUBqri2bRtjZUb7ySng/jF3FAdE3EkUeMGJlbbMGiRDzxuuL7a hujRRSvqiBBjPlyRb2wxz//5RzDU0QJQzWlAYd3+42ABVzs2Do0ZTVTtsK2PMGjM x+bfDFmB2G5+GMYMehvhcRI/q4jjiET0yNG0UqeyzDsKjuwXU7lWC7JtW79qBX0l SIBpFcRSSbs6b0R02f8FMVWx8jkuazbhTIuwkomYt3NUEtBqUPmlzATapHwUc/Xo PqRubrsihnMVGe/2PqaEl5six42tQoOQAS3u9tColVMaLDTaNrhXMf7eTcLWdSEz T11Eg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnhepleektdffjeevfefhiedtudevudetjeejvdej uedtvdevveefuedujeejffetieeknecuffhomhgrihhnpegtrghmrdgrtgdruhhknecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepnhhitghosehf lhhugihnihgtrdhnvghtpdhnsggprhgtphhtthhopeehpdhmohguvgepshhmthhpohhuth dprhgtphhtthhopehnphhithhrvgessggrhihlihgsrhgvrdgtohhmpdhrtghpthhtohep jhhirhhishhlrggshieskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepghhrvghgkhhhse hlihhnuhigfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoheplhhinhhugidqkhgv rhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinhhugidqsh gvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 02B321116609; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 02/13] vt: move unicode processing to a separate file Date: Tue, 15 Apr 2025 15:17:51 -0400 Message-ID: <20250415192212.33949-3-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre This will make it easier to maintain. Also make it depend on CONFIG_CONSOLE_TRANSLATIONS. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/Makefile | 3 ++- drivers/tty/vt/ucs.c | 54 ++++++++++++++++++++++++++++++++++++++ drivers/tty/vt/vt.c | 40 +--------------------------- include/linux/consolemap.h | 6 +++++ 4 files changed, 63 insertions(+), 40 deletions(-) create mode 100644 drivers/tty/vt/ucs.c diff --git a/drivers/tty/vt/Makefile b/drivers/tty/vt/Makefile index 2c8ce8b592..e24c8546ac 100644 --- a/drivers/tty/vt/Makefile +++ b/drivers/tty/vt/Makefile @@ -7,7 +7,8 @@ FONTMAPFILE = cp437.uni obj-$(CONFIG_VT) += vt_ioctl.o vc_screen.o \ selection.o keyboard.o \ vt.o defkeymap.o -obj-$(CONFIG_CONSOLE_TRANSLATIONS) += consolemap.o consolemap_deftbl.o +obj-$(CONFIG_CONSOLE_TRANSLATIONS) += consolemap.o consolemap_deftbl.o \ + ucs.o # Files generated that shall be removed upon make clean clean-files := consolemap_deftbl.c defkeymap.c diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c new file mode 100644 index 0000000000..0f6c087158 --- /dev/null +++ b/drivers/tty/vt/ucs.c @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include + +/* ucs_is_double_width() is based on the wcwidth() implementation by + * Markus Kuhn -- 2007-05-26 (Unicode 5.0) + * Latest version: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c + */ + +struct ucs_interval { + u32 first; + u32 last; +}; + +static const struct ucs_interval ucs_double_width_ranges[] = { + { 0x1100, 0x115F }, { 0x2329, 0x232A }, { 0x2E80, 0x303E }, + { 0x3040, 0xA4CF }, { 0xAC00, 0xD7A3 }, { 0xF900, 0xFAFF }, + { 0xFE10, 0xFE19 }, { 0xFE30, 0xFE6F }, { 0xFF00, 0xFF60 }, + { 0xFFE0, 0xFFE6 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } +}; + +static int interval_cmp(const void *key, const void *element) +{ + u32 cp = *(u32 *)key; + const struct ucs_interval *entry = element; + + if (cp < entry->first) + return -1; + if (cp > entry->last) + return 1; + return 0; +} + +/** + * Determine if a Unicode code point is double-width. + * + * @param cp: Unicode code point (UCS-4) + * Return: true if the character is double-width, false otherwise + */ +bool ucs_is_double_width(u32 cp) +{ + size_t size = ARRAY_SIZE(ucs_double_width_ranges); + + if (!in_range(cp, ucs_double_width_ranges[0].first, + ucs_double_width_ranges[size - 1].last)) + return false; + + return __inline_bsearch(&cp, ucs_double_width_ranges, size, + sizeof(*ucs_double_width_ranges), + interval_cmp) != NULL; +} diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index b5f3c8a818..bcb508bc15 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -104,7 +104,6 @@ #include #include #include -#include #include #define MAX_NR_CON_DRIVER 16 @@ -2712,43 +2711,6 @@ static void do_con_trol(struct tty_struct *tty, struct vc_data *vc, u8 c) } } -/* is_double_width() is based on the wcwidth() implementation by - * Markus Kuhn -- 2007-05-26 (Unicode 5.0) - * Latest version: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c - */ -struct interval { - uint32_t first; - uint32_t last; -}; - -static int ucs_cmp(const void *key, const void *elt) -{ - uint32_t ucs = *(uint32_t *)key; - struct interval e = *(struct interval *) elt; - - if (ucs > e.last) - return 1; - else if (ucs < e.first) - return -1; - return 0; -} - -static int is_double_width(uint32_t ucs) -{ - static const struct interval double_width[] = { - { 0x1100, 0x115F }, { 0x2329, 0x232A }, { 0x2E80, 0x303E }, - { 0x3040, 0xA4CF }, { 0xAC00, 0xD7A3 }, { 0xF900, 0xFAFF }, - { 0xFE10, 0xFE19 }, { 0xFE30, 0xFE6F }, { 0xFF00, 0xFF60 }, - { 0xFFE0, 0xFFE6 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } - }; - if (ucs < double_width[0].first || - ucs > double_width[ARRAY_SIZE(double_width) - 1].last) - return 0; - - return bsearch(&ucs, double_width, ARRAY_SIZE(double_width), - sizeof(struct interval), ucs_cmp) != NULL; -} - struct vc_draw_region { unsigned long from, to; int x; @@ -2953,7 +2915,7 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, bool inverse = false; if (vc->vc_utf && !vc->vc_disp_ctrl) { - if (is_double_width(c)) + if (ucs_is_double_width(c)) width = 2; } diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index c35db4896c..caf079bcb8 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -28,6 +28,7 @@ int conv_uni_to_pc(struct vc_data *conp, long ucs); u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); +bool ucs_is_double_width(uint32_t cp); #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) @@ -57,6 +58,11 @@ static inline int conv_uni_to_8bit(u32 uni) } static inline void console_map_init(void) { } + +static inline bool ucs_is_double_width(uint32_t cp) +{ + return false; +} #endif /* CONFIG_CONSOLE_TRANSLATIONS */ #endif /* __LINUX_CONSOLEMAP_H__ */ From patchwork Tue Apr 15 19:17:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 881621 Received: from fhigh-a5-smtp.messagingengine.com (fhigh-a5-smtp.messagingengine.com [103.168.172.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF6972459E3; Tue, 15 Apr 2025 19:22:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744943; cv=none; b=AIUrt9hCpjXs1EqW6/E9OumsfAs8ck735y4obAlN+4Mc/pnVXQChd3I4DG/7faeTGQDHXmbuWpKPlFWBESlGy1Cvm7FJiH38dz9OYQCnjbE3lgIlRWVGhgcB2JenZDje+FeTRPshAzt2fah3/I1x/0vneT9ADqJjZNYEiqxOEIw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744943; c=relaxed/simple; bh=BFavNU87Pi/qvKEXP74cUYtLTr7vP9t8MY72DZobvHQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ecowT/A3JbFLKGg+NKgE6kbYo0gbp4FPRz7/FCbCSZbvtadoybChTneSiP9LnBxRBlvDVc2oDxbhjXRdMZlonmpYyo3oBZ7J7ZxeCLACWNbJTBm1p/V56IIy9Z5TX+CuT2IfyDZ/dPcoBSY8Wws57Be39OYm95IhSzSXukjQnUQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=lHPORPlQ; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=sruOCi+g; arc=none smtp.client-ip=103.168.172.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="lHPORPlQ"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="sruOCi+g" Received: from phl-compute-06.internal (phl-compute-06.phl.internal [10.202.2.46]) by mailfhigh.phl.internal (Postfix) with ESMTP id 1E11A11401C8; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-06.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744939; x= 1744831339; bh=u9E209kSt5qFpINKL5/QIsJ9CN000ksi85W38AnQZP0=; b=l HPORPlQ8kAr+4TZWPbblrYKEdK2JAbHP0cn+dv/eM9Egb+e1tHQKEmA8HLTX9rQ8 1GR4pio53k9bcG/YlQz1C57aidEaHcRZs3hp5Lc9Trro1fiwIPwYaU0FRG9ajMtK LZnXNtOLvY7HH21vJBpLN/3EQCxK/0PpJFTwb6Rj28GbsHOX0+jrMjZDcC7hCQxG ZeEN15znWOxzjWVMYxxGgUDM9vpgQHIh5aCeUdFbvHjebphmVPIVvYxs6RP7jtag X49M+tW8HwV7+9qxigPP/LownY+6R4sBfgpzWoQhQBUXATpyUdS8n3d6+wXnby25 kNG+bvxdJBtFxhJzudf0Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x=1744831339; bh=u 9E209kSt5qFpINKL5/QIsJ9CN000ksi85W38AnQZP0=; b=sruOCi+g2URSOnD9W DvcLkeN4BRkwOFCuhwi4GDXtoQ2xoP9Qw6AflUcjAqt2/ladLnH6BpnjeI4AmNJh 1DolCP4KRnAGTGAXxU4gNfWfZLAjacJcsAyTQsA4lqgpQA/LdASLkCwiVTqqSo3T O34/JpMXV45EMPeQDuPTePgd7TfoNlplSd7FOVMQBvDgYu9NopyWcT+e07YeZjdj R89mDZT1chfILXzgMaCsV4vVXJM9wYjEDQGgQvxWGbAnEoycsq6yKqQ3LwV89kFa YlxOnkM7wQEzNoXDiNuDbgLd2msWaWjhi/uXaO3AP6+/AzxbIpxxFJpNsU1fa9RF CaX4A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 20135111660A; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 03/13] vt: properly support zero-width Unicode code points Date: Tue, 15 Apr 2025 15:17:52 -0400 Message-ID: <20250415192212.33949-4-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Zero-width Unicode code points are causing misalignment in vertically aligned content, disrupting the visual layout. Let's handle zero-width code points more intelligently. Double-width code points are stored in the screen grid followed by a white space code point to create the expected screen layout. When a double-width code point is followed by a zero-width code point in the console incoming bytestream (e.g., an emoji with a presentation selector) then we may replace the white space padding by that zero-width code point instead of dropping it. This maximize screen content information while preserving proper layout. If a zero-width code point is preceded by a single-width code point then the above trick is not possible and such zero-width code point must be dropped. VS16 (Variation Selector 16, U+FE0F) is special as it doubles the width of the preceding single-width code point. We handle that case by giving VS16 a width of 1 when that happens. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/vt.c | 70 ++++++++++++++++++++++++++++++++++++-- include/linux/consolemap.h | 10 ++++++ 2 files changed, 78 insertions(+), 2 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index bcb508bc15..a989feffad 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -443,6 +443,15 @@ static void vc_uniscr_scroll(struct vc_data *vc, unsigned int top, } } +static u32 vc_uniscr_getc(struct vc_data *vc, int relative_pos) +{ + int pos = vc->state.x + vc->vc_need_wrap + relative_pos; + + if (vc->vc_uni_lines && in_range(pos, 0, vc->vc_cols)) + return vc->vc_uni_lines[vc->state.y][pos]; + return 0; +} + static void vc_uniscr_copy_area(u32 **dst_lines, unsigned int dst_cols, unsigned int dst_rows, @@ -2905,6 +2914,60 @@ static bool vc_is_control(struct vc_data *vc, int tc, int c) return false; } +static void vc_con_rewind(struct vc_data *vc) +{ + if (vc->state.x && !vc->vc_need_wrap) { + vc->vc_pos -= 2; + vc->state.x--; + } + vc->vc_need_wrap = 0; +} + +#define UCS_VS16 0xfe0f /* Variation Selector 16 */ + +static int vc_process_ucs(struct vc_data *vc, int c, int *tc) +{ + u32 prev_c, curr_c = c; + + if (ucs_is_double_width(curr_c)) + return 2; + + if (!ucs_is_zero_width(curr_c)) + return 1; + + /* From here curr_c is known to be zero-width. */ + + if (ucs_is_double_width(vc_uniscr_getc(vc, -2))) { + /* + * Let's merge this zero-width code point with the preceding + * double-width code point by replacing the existing + * whitespace padding. To do so we rewind one column and + * pretend this has a width of 1. + * We give the legacy display the same initial space padding. + */ + vc_con_rewind(vc); + *tc = ' '; + return 1; + } + + /* From here the preceding character, if any, must be single-width. */ + prev_c = vc_uniscr_getc(vc, -1); + + if (curr_c == UCS_VS16 && prev_c != 0) { + /* + * VS16 (U+FE0F) is special. It typically turns the preceding + * single-width character into a double-width one. Let it + * have a width of 1 effectively making the combination with + * the preceding character double-width. + */ + *tc = ' '; + return 1; + } + + /* Otherwise zero-width code points are ignored. */ + return 0; +} + static int vc_con_write_normal(struct vc_data *vc, int tc, int c, struct vc_draw_region *draw) { @@ -2915,8 +2978,9 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, bool inverse = false; if (vc->vc_utf && !vc->vc_disp_ctrl) { - if (ucs_is_double_width(c)) - width = 2; + width = vc_process_ucs(vc, c, &tc); + if (!width) + goto out; } /* Now try to find out how to display it */ @@ -2995,6 +3059,8 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, tc = ' '; next_c = ' '; } + +out: notify_write(vc, c); if (inverse) diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index caf079bcb8..7d778752dc 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -29,6 +29,11 @@ u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); bool ucs_is_double_width(uint32_t cp); +static inline bool ucs_is_zero_width(uint32_t cp) +{ + /* coming soon */ + return false; +} #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) @@ -63,6 +68,11 @@ static inline bool ucs_is_double_width(uint32_t cp) { return false; } + +static inline bool ucs_is_zero_width(uint32_t cp) +{ + return false; +} #endif /* CONFIG_CONSOLE_TRANSLATIONS */ #endif /* __LINUX_CONSOLEMAP_H__ */ From patchwork Tue Apr 15 19:17:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882217 Received: from fhigh-a5-smtp.messagingengine.com (fhigh-a5-smtp.messagingengine.com [103.168.172.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADAFF2459DB; Tue, 15 Apr 2025 19:22:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744944; cv=none; b=ON/Ozo8/ipxZDog7y8Zkfq9x0ZsCwCtZIunwNnlyxo8eWpGCde8PvUdcR5uAfhrZBK0yBWHuxgR5TzJc09/PX9YKSiFnjRLfq898E7PCNuxbZfx20q+eqhjYbS5uWPg8A9tAfzID5QmlwYXVUipmAba96VI6CArI5d2Qh4MxPvw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744944; c=relaxed/simple; bh=5zyTJY20PNZZfXt0Iyj0gmngEbYnPqxuuzrKWaxHZDA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GF77eqCUoaUgsrLzcbWUmPZOW3iEtlUJhglIdUevlhM5DVllURHuA2a/XA4vDZltxB+WFRCPlvLvMmtnaRVaKMVfaa/vluUrv0R9LQwQnwtOoM5DUl3hTGzghAmPvls9OEqpeB4Xf3+18Txi4EgI9cFjkTcjWh8xSwl3pSy26Sk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=KW445fu6; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=YikXAjv1; arc=none smtp.client-ip=103.168.172.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="KW445fu6"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="YikXAjv1" Received: from phl-compute-03.internal (phl-compute-03.phl.internal [10.202.2.43]) by mailfhigh.phl.internal (Postfix) with ESMTP id 2211F11401F4; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-03.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744939; x= 1744831339; bh=PtkP7nsces5fASeuyalKBh5gJxPb0dmR19dtFfMxFjo=; b=K W445fu60pnCQCY6NXk7bv0j1ToR+qR8sv0yo2yp36GAvydRqGjUAoxHwoMpclG+O EzAhngXw5R+G0neAoXN3stLVUCDC5VuJw264DaqnR0J4Kxkcy3hvbqhkOcME56UP nIYTJNyeP5PEaBQRg+Blyi57MoRumH8vCiwQbxtwuNkrTUWaajsm4/5h1jN5O4Cv cUmVe6b0gkkZcrreDeXt8pSL1wutkXFHJXKfBGD/LKCiO3K2NOTU/UCti4sdV56t ZQMZ/OqriDkKZR1eZaekyS2IvQyfAFyP9dowFz9QwZgIxtQq5LSNxiI5PWER+rAN HqU6O0Q0W3pP+QYEs0+hg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x=1744831339; bh=P tkP7nsces5fASeuyalKBh5gJxPb0dmR19dtFfMxFjo=; b=YikXAjv1d3U8bNxI6 QYi3Crk58Gvw8+Fbtg+YScBQIgVY6X0HSaq8fzFYeLjfWAOtNqxkVV8koMXamkix Sw/ai8NxI/wo8TJJ95G3XMdDmggChHDcuNh+7AcIqMq9ekRLdQM3B6ryer5Maid9 wUHm6c6/PuOMdSHkHWHKCzyPBE7RpbnGV/Gedg3EP3nSDwO4AwfQPLlzCfAp0xxH 33shFuVNq2ct97Qv8k7fiT9VQ2Ci+Mi276VZvZUtdqRN5tStdGp2Bs+BzCyCpWB3 w9cenq0S7+6EvaLMlQtxs5OVInHUjubxTfKmBwRHUfiz5RqHjrtk1kGc5khTd2ls ruzpQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnhepfedutdfhfffgleeugfeileevkeeukeejtdff leeklefhgfdttdekgfelheevhfdunecuffhomhgrihhnpehunhhitghouggvrdhorhhgpd gtshhsfihgrdhorhhgpdiffedrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgr rhgrmhepmhgrihhlfhhrohhmpehnihgtohesfhhluhignhhitgdrnhgvthdpnhgspghrtg hpthhtohephedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepnhhpihhtrhgvsegs rgihlhhisghrvgdrtghomhdprhgtphhtthhopehjihhrihhslhgrsgihsehkvghrnhgvlh drohhrghdprhgtphhtthhopehgrhgvghhkhheslhhinhhugihfohhunhgurghtihhonhdr ohhrghdprhgtphhtthhopehlihhnuhigqdhkvghrnhgvlhesvhhgvghrrdhkvghrnhgvlh drohhrghdprhgtphhtthhopehlihhnuhigqdhsvghrihgrlhesvhhgvghrrdhkvghrnhgv lhdrohhrgh X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 3DCC8111660C; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 04/13] vt: introduce gen_ucs_width_table.py to create ucs_width_table.h Date: Tue, 15 Apr 2025 15:17:53 -0400 Message-ID: <20250415192212.33949-5-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre The table in ucs.c is terribly out of date and incomplete. We also need a second table to store zero-width code points. Properly maintaining those tables manually is impossible. So here's a script to generate them. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/gen_ucs_width_table.py | 256 ++++++++++++++++++++++++++ 1 file changed, 256 insertions(+) create mode 100755 drivers/tty/vt/gen_ucs_width_table.py diff --git a/drivers/tty/vt/gen_ucs_width_table.py b/drivers/tty/vt/gen_ucs_width_table.py new file mode 100755 index 0000000000..00510444a7 --- /dev/null +++ b/drivers/tty/vt/gen_ucs_width_table.py @@ -0,0 +1,256 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# Leverage Python's unicodedata module to generate ucs_width_table.h + +import unicodedata +import sys + +# This script's file name +from pathlib import Path +this_file = Path(__file__).name + +# Output file name +out_file = "ucs_width_table.h" + +# --- Global Constants for Width Assignments --- + +# Known zero-width characters +KNOWN_ZERO_WIDTH = ( + 0x200B, # ZERO WIDTH SPACE + 0x200C, # ZERO WIDTH NON-JOINER + 0x200D, # ZERO WIDTH JOINER + 0x2060, # WORD JOINER + 0xFEFF # ZERO WIDTH NO-BREAK SPACE (BOM) +) + +# Zero-width emoji modifiers and components +# NOTE: Some of these characters would normally be single-width according to +# East Asian Width properties, but we deliberately override them to be +# zero-width because they function as modifiers in emoji sequences. +EMOJI_ZERO_WIDTH = [ + # Skin tone modifiers + (0x1F3FB, 0x1F3FF), # Emoji modifiers (skin tones) + + # Variation selectors (note: VS16 is treated specially in vt.c) + (0xFE00, 0xFE0F), # Variation Selectors 1-16 + + # Gender and hair style modifiers + # These would be single-width by Unicode properties, but are zero-width + # when part of emoji + (0x2640, 0x2640), # Female sign + (0x2642, 0x2642), # Male sign + (0x26A7, 0x26A7), # Transgender symbol + (0x1F9B0, 0x1F9B3), # Hair components (red, curly, white, bald) + + # Tag characters + (0xE0020, 0xE007E), # Tags +] + +# Regional indicators (flag components) +REGIONAL_INDICATORS = (0x1F1E6, 0x1F1FF) # Regional indicator symbols A-Z + +# Double-width emoji ranges +# +# Many emoji characters are classified as single-width according to Unicode +# Standard Annex #11 East Asian Width property (N or Neutral), but we +# deliberately override them to be double-width. References: +# 1. Unicode Technical Standard #51: Unicode Emoji +# (https://www.unicode.org/reports/tr51/) +# 2. Principle of "emoji presentation" in WHATWG CSS Text specification +# (https://drafts.csswg.org/css-text-3/#character-properties) +# 3. Terminal emulator implementations (iTerm2, Windows Terminal, etc.) which +# universally render emoji as double-width characters regardless of their +# Unicode EAW property +# 4. W3C Work Item: Requirements for Japanese Text Layout - Section 3.8.1 +# Emoji width (https://www.w3.org/TR/jlreq/) +EMOJI_RANGES = [ + (0x1F000, 0x1F02F), # Mahjong Tiles (EAW: N, but displayed as double-width) + (0x1F0A0, 0x1F0FF), # Playing Cards (EAW: N, but displayed as double-width) + (0x1F300, 0x1F5FF), # Miscellaneous Symbols and Pictographs + (0x1F600, 0x1F64F), # Emoticons + (0x1F680, 0x1F6FF), # Transport and Map Symbols + (0x1F700, 0x1F77F), # Alchemical Symbols + (0x1F780, 0x1F7FF), # Geometric Shapes Extended + (0x1F800, 0x1F8FF), # Supplemental Arrows-C + (0x1F900, 0x1F9FF), # Supplemental Symbols and Pictographs + (0x1FA00, 0x1FA6F), # Chess Symbols + (0x1FA70, 0x1FAFF), # Symbols and Pictographs Extended-A +] + +def create_width_tables(): + """ + Creates Unicode character width tables and returns the data structures. + + Returns: + tuple: (zero_width_ranges, double_width_ranges) + """ + + # Width data mapping + width_map = {} # Maps code points to width (0, 1, 2) + + # Mark emoji modifiers as zero-width + for start, end in EMOJI_ZERO_WIDTH: + for cp in range(start, end + 1): + width_map[cp] = 0 + + # Mark all regional indicators as single-width as they are usually paired + # providing a combined width of 2 when displayed together. + start, end = REGIONAL_INDICATORS + for cp in range(start, end + 1): + width_map[cp] = 1 + + # Process all assigned Unicode code points (Basic Multilingual Plane + + # Supplementary Planes) Range 0x0 to 0x10FFFF (the full Unicode range) + for block_start in range(0, 0x110000, 0x1000): + block_end = block_start + 0x1000 + for cp in range(block_start, block_end): + try: + char = chr(cp) + + # Skip if already processed + if cp in width_map: + continue + + # Check for combining marks and a format characters + category = unicodedata.category(char) + + # Combining marks + if category.startswith('M'): + width_map[cp] = 0 + continue + + # Format characters + # Since we have no support for bidirectional text, all format + # characters (category Cf) can be treated with width 0 (zero) + # for simplicity, as they don't need to occupy visual space + # in a non-bidirectional text environment. + if category == 'Cf': + width_map[cp] = 0 + continue + + # Known zero-width characters + if cp in KNOWN_ZERO_WIDTH: + width_map[cp] = 0 + continue + + # Use East Asian Width property + eaw = unicodedata.east_asian_width(char) + if eaw in ('F', 'W'): # Fullwidth or Wide + width_map[cp] = 2 + elif eaw in ('Na', 'H', 'N', 'A'): # Narrow, Halfwidth, Neutral, Ambiguous + width_map[cp] = 1 + else: + # Default to single-width for unknown + width_map[cp] = 1 + + except (ValueError, OverflowError): + # Skip invalid code points + continue + + # Process Emoji - generally double-width + for start, end in EMOJI_RANGES: + for cp in range(start, end + 1): + if cp not in width_map or width_map[cp] != 0: # Don't override zero-width + try: + char = chr(cp) + width_map[cp] = 2 + except (ValueError, OverflowError): + continue + + # Optimize to create range tables + def ranges_optimize(width_data, target_width): + points = sorted([cp for cp, width in width_data.items() if width == target_width]) + if not points: + return [] + + # Group consecutive code points into ranges + ranges = [] + start = points[0] + prev = start + + for cp in points[1:]: + if cp > prev + 1: + ranges.append((start, prev)) + start = cp + prev = cp + + # Add the last range + ranges.append((start, prev)) + return ranges + + # Extract ranges for each width + zero_width_ranges = ranges_optimize(width_map, 0) + double_width_ranges = ranges_optimize(width_map, 2) + + return zero_width_ranges, double_width_ranges + +def write_tables(zero_width_ranges, double_width_ranges): + """ + Write the generated tables to C header file. + + Args: + zero_width_ranges: List of (start, end) ranges for zero-width characters + double_width_ranges: List of (start, end) ranges for double-width characters + """ + + # Function to generate code point description comments + def get_code_point_comment(start, end): + try: + start_char_desc = unicodedata.name(chr(start)) + if start == end: + return f"/* {start_char_desc} */" + else: + end_char_desc = unicodedata.name(chr(end)) + return f"/* {start_char_desc} - {end_char_desc} */" + except: + if start == end: + return f"/* U+{start:04X} */" + else: + return f"/* U+{start:04X} - U+{end:04X} */" + + # Generate C tables + with open(out_file, 'w') as f: + f.write(f"""\ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * {out_file} - Unicode character width + * + * Auto-generated by {this_file} + * + * Unicode Version: {unicodedata.unidata_version} + */ + +/* Zero-width character ranges */ +static const struct ucs_interval ucs_zero_width_ranges[] = {{ +""") + + for start, end in zero_width_ranges: + comment = get_code_point_comment(start, end) + f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") + + f.write("""\ +}; + +/* Double-width character ranges */ +static const struct ucs_interval ucs_double_width_ranges[] = { +""") + + for start, end in double_width_ranges: + comment = get_code_point_comment(start, end) + f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") + + f.write("};\n") + +if __name__ == "__main__": + # Write tables to header file + zero_width_ranges, double_width_ranges = create_width_tables() + write_tables(zero_width_ranges, double_width_ranges) + + # Print summary + zero_width_count = sum(end - start + 1 for start, end in zero_width_ranges) + double_width_count = sum(end - start + 1 for start, end in double_width_ranges) + print(f"Generated {out_file} with:") + print(f"- {len(zero_width_ranges)} zero-width ranges covering ~{zero_width_count} code points") + print(f"- {len(double_width_ranges)} double-width ranges covering ~{double_width_count} code points") + print(f"- Unicode Version: {unicodedata.unidata_version}") From patchwork Tue Apr 15 19:17:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 881618 Received: from fhigh-a5-smtp.messagingengine.com (fhigh-a5-smtp.messagingengine.com [103.168.172.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E1882472BF; Tue, 15 Apr 2025 19:22:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744946; cv=none; b=d4ZLALg6Tp1kwknFytJ00/b++JTIC6OEDAw2bz2q4EW2Kkm8L2ycSK+v69e/UWTEA1FfcARf55HTkZ03ORYYy0cC7UUkTACfljo9LgIY0P5gaVb2sRPSMqMMkeaOt5ktxiSHDr/Eq9Fhyabw95bwZkciZptC2QC9nJcnVgnOL+E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744946; c=relaxed/simple; bh=660TsqQTVIOW/sfhjyBHpTi4iXWDO6mcstCc/JmzYE0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oT9bLxs/8JsYWbexmclrg6pTq9XEyrfeOKuVUxFoO1zh1oy/+Fr49Z75MRdM03CMXiYJngigGRk+8iwDjhGcYi3tC2eHVYpoDk7Flvflq1ijfVsKMzkYIoMlp+nKhn0j+OwZgQ0bWXL626qW0Io2qAhCpUCAM3HkswZQcNmm8bo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=CxwrnR+f; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=DiL2rtH4; arc=none smtp.client-ip=103.168.172.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="CxwrnR+f"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="DiL2rtH4" Received: from phl-compute-08.internal (phl-compute-08.phl.internal [10.202.2.48]) by mailfhigh.phl.internal (Postfix) with ESMTP id DA0CA1140139; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-08.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744939; x= 1744831339; bh=knAzhWbDFHF/hzKQp+oJi1Fwh6zzADlLrc+A1Hp480Y=; b=C xwrnR+f5iO7O+wPNnChiHsKSLhL5eG2Ki+FV9AxCai+pwcBixJSDMC9YtwTaGeTZ r2000Xra/rOEnDdEmLzd+MizUyVBK7Om6iB56PfmAWJqBXTZ39FblNbDQ4iDGxGA nxLiY1cZFwrz16vWosMUQb74Svir301hMU3waIn+OP5KOCtXJl38RKAOwrpixQxN o0bUb5AvAKnhzp/FwXwAuEVO65USUYzOt59t4TOG4ESAcYXn+k4WOMSPTS1dmsjo ivXVgacKytiqtyOyOA80NszFOa8d2F7T2b5q/MbnrYyYc0zaDnj/m3oDclI2rPkE 5FuRqNEN4N2wHUs69EvqA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x=1744831339; bh=k nAzhWbDFHF/hzKQp+oJi1Fwh6zzADlLrc+A1Hp480Y=; b=DiL2rtH4YNqc0qLJb zs/pr0ngzz9gredmr6ywUxTQSZKj891U4NB0BJobC3eEB2Ho7xotWzymrffnlETY hT0DbsKjgQmdaVHWBBJpiJvfhClXnIN2Za+E+1FkEm+aIYccHQ2lBjMYguFczHof L5ZLFwpdIxy5ex92BrULN7wpd+aRm9jWRPawTfa8u5LwbHyNqsENsLPC5UCuJx1c HLFqEQydqCXvzBfcUlczd+Gnno8PVg67lY4lgtJZlh9k4fbZPm6uyq34p13yM/Vb x4+MUoziAqP9N+QoP8J45lEMcJ2PKUuhka6fnXW+i8cvN0Fz9DqhvIKU4LqzeVHM gHTcw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 5CEEC111660D; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 05/13] vt: create ucs_width_table.h with gen_ucs_width_table.py Date: Tue, 15 Apr 2025 15:17:54 -0400 Message-ID: <20250415192212.33949-6-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Provide comprehensive ranges for double-width and zero-width Unicode code points. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/ucs_width_table.h | 445 +++++++++++++++++++++++++++++++ 1 file changed, 445 insertions(+) create mode 100644 drivers/tty/vt/ucs_width_table.h diff --git a/drivers/tty/vt/ucs_width_table.h b/drivers/tty/vt/ucs_width_table.h new file mode 100644 index 0000000000..9cc86b5cdf --- /dev/null +++ b/drivers/tty/vt/ucs_width_table.h @@ -0,0 +1,445 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * ucs_width_table.h - Unicode character width + * + * Auto-generated by gen_ucs_width_table.py + * + * Unicode Version: 16.0.0 + */ + +/* Zero-width character ranges */ +static const struct ucs_interval ucs_zero_width_ranges[] = { + { 0x000AD, 0x000AD }, /* SOFT HYPHEN */ + { 0x00300, 0x0036F }, /* COMBINING GRAVE ACCENT - COMBINING LATIN SMALL LETTER X */ + { 0x00483, 0x00489 }, /* COMBINING CYRILLIC TITLO - COMBINING CYRILLIC MILLIONS SIGN */ + { 0x00591, 0x005BD }, /* HEBREW ACCENT ETNAHTA - HEBREW POINT METEG */ + { 0x005BF, 0x005BF }, /* HEBREW POINT RAFE */ + { 0x005C1, 0x005C2 }, /* HEBREW POINT SHIN DOT - HEBREW POINT SIN DOT */ + { 0x005C4, 0x005C5 }, /* HEBREW MARK UPPER DOT - HEBREW MARK LOWER DOT */ + { 0x005C7, 0x005C7 }, /* HEBREW POINT QAMATS QATAN */ + { 0x00600, 0x00605 }, /* ARABIC NUMBER SIGN - ARABIC NUMBER MARK ABOVE */ + { 0x00610, 0x0061A }, /* ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM - ARABIC SMALL KASRA */ + { 0x0061C, 0x0061C }, /* ARABIC LETTER MARK */ + { 0x0064B, 0x0065F }, /* ARABIC FATHATAN - ARABIC WAVY HAMZA BELOW */ + { 0x00670, 0x00670 }, /* ARABIC LETTER SUPERSCRIPT ALEF */ + { 0x006D6, 0x006DD }, /* ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA - ARABIC END OF AYAH */ + { 0x006DF, 0x006E4 }, /* ARABIC SMALL HIGH ROUNDED ZERO - ARABIC SMALL HIGH MADDA */ + { 0x006E7, 0x006E8 }, /* ARABIC SMALL HIGH YEH - ARABIC SMALL HIGH NOON */ + { 0x006EA, 0x006ED }, /* ARABIC EMPTY CENTRE LOW STOP - ARABIC SMALL LOW MEEM */ + { 0x0070F, 0x0070F }, /* SYRIAC ABBREVIATION MARK */ + { 0x00711, 0x00711 }, /* SYRIAC LETTER SUPERSCRIPT ALAPH */ + { 0x00730, 0x0074A }, /* SYRIAC PTHAHA ABOVE - SYRIAC BARREKH */ + { 0x007A6, 0x007B0 }, /* THAANA ABAFILI - THAANA SUKUN */ + { 0x007EB, 0x007F3 }, /* NKO COMBINING SHORT HIGH TONE - NKO COMBINING DOUBLE DOT ABOVE */ + { 0x007FD, 0x007FD }, /* NKO DANTAYALAN */ + { 0x00816, 0x00819 }, /* SAMARITAN MARK IN - SAMARITAN MARK DAGESH */ + { 0x0081B, 0x00823 }, /* SAMARITAN MARK EPENTHETIC YUT - SAMARITAN VOWEL SIGN A */ + { 0x00825, 0x00827 }, /* SAMARITAN VOWEL SIGN SHORT A - SAMARITAN VOWEL SIGN U */ + { 0x00829, 0x0082D }, /* SAMARITAN VOWEL SIGN LONG I - SAMARITAN MARK NEQUDAA */ + { 0x00859, 0x0085B }, /* MANDAIC AFFRICATION MARK - MANDAIC GEMINATION MARK */ + { 0x00890, 0x00891 }, /* ARABIC POUND MARK ABOVE - ARABIC PIASTRE MARK ABOVE */ + { 0x00897, 0x0089F }, /* ARABIC PEPET - ARABIC HALF MADDA OVER MADDA */ + { 0x008CA, 0x00903 }, /* ARABIC SMALL HIGH FARSI YEH - DEVANAGARI SIGN VISARGA */ + { 0x0093A, 0x0093C }, /* DEVANAGARI VOWEL SIGN OE - DEVANAGARI SIGN NUKTA */ + { 0x0093E, 0x0094F }, /* DEVANAGARI VOWEL SIGN AA - DEVANAGARI VOWEL SIGN AW */ + { 0x00951, 0x00957 }, /* DEVANAGARI STRESS SIGN UDATTA - DEVANAGARI VOWEL SIGN UUE */ + { 0x00962, 0x00963 }, /* DEVANAGARI VOWEL SIGN VOCALIC L - DEVANAGARI VOWEL SIGN VOCALIC LL */ + { 0x00981, 0x00983 }, /* BENGALI SIGN CANDRABINDU - BENGALI SIGN VISARGA */ + { 0x009BC, 0x009BC }, /* BENGALI SIGN NUKTA */ + { 0x009BE, 0x009C4 }, /* BENGALI VOWEL SIGN AA - BENGALI VOWEL SIGN VOCALIC RR */ + { 0x009C7, 0x009C8 }, /* BENGALI VOWEL SIGN E - BENGALI VOWEL SIGN AI */ + { 0x009CB, 0x009CD }, /* BENGALI VOWEL SIGN O - BENGALI SIGN VIRAMA */ + { 0x009D7, 0x009D7 }, /* BENGALI AU LENGTH MARK */ + { 0x009E2, 0x009E3 }, /* BENGALI VOWEL SIGN VOCALIC L - BENGALI VOWEL SIGN VOCALIC LL */ + { 0x009FE, 0x009FE }, /* BENGALI SANDHI MARK */ + { 0x00A01, 0x00A03 }, /* GURMUKHI SIGN ADAK BINDI - GURMUKHI SIGN VISARGA */ + { 0x00A3C, 0x00A3C }, /* GURMUKHI SIGN NUKTA */ + { 0x00A3E, 0x00A42 }, /* GURMUKHI VOWEL SIGN AA - GURMUKHI VOWEL SIGN UU */ + { 0x00A47, 0x00A48 }, /* GURMUKHI VOWEL SIGN EE - GURMUKHI VOWEL SIGN AI */ + { 0x00A4B, 0x00A4D }, /* GURMUKHI VOWEL SIGN OO - GURMUKHI SIGN VIRAMA */ + { 0x00A51, 0x00A51 }, /* GURMUKHI SIGN UDAAT */ + { 0x00A70, 0x00A71 }, /* GURMUKHI TIPPI - GURMUKHI ADDAK */ + { 0x00A75, 0x00A75 }, /* GURMUKHI SIGN YAKASH */ + { 0x00A81, 0x00A83 }, /* GUJARATI SIGN CANDRABINDU - GUJARATI SIGN VISARGA */ + { 0x00ABC, 0x00ABC }, /* GUJARATI SIGN NUKTA */ + { 0x00ABE, 0x00AC5 }, /* GUJARATI VOWEL SIGN AA - GUJARATI VOWEL SIGN CANDRA E */ + { 0x00AC7, 0x00AC9 }, /* GUJARATI VOWEL SIGN E - GUJARATI VOWEL SIGN CANDRA O */ + { 0x00ACB, 0x00ACD }, /* GUJARATI VOWEL SIGN O - GUJARATI SIGN VIRAMA */ + { 0x00AE2, 0x00AE3 }, /* GUJARATI VOWEL SIGN VOCALIC L - GUJARATI VOWEL SIGN VOCALIC LL */ + { 0x00AFA, 0x00AFF }, /* GUJARATI SIGN SUKUN - GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE */ + { 0x00B01, 0x00B03 }, /* ORIYA SIGN CANDRABINDU - ORIYA SIGN VISARGA */ + { 0x00B3C, 0x00B3C }, /* ORIYA SIGN NUKTA */ + { 0x00B3E, 0x00B44 }, /* ORIYA VOWEL SIGN AA - ORIYA VOWEL SIGN VOCALIC RR */ + { 0x00B47, 0x00B48 }, /* ORIYA VOWEL SIGN E - ORIYA VOWEL SIGN AI */ + { 0x00B4B, 0x00B4D }, /* ORIYA VOWEL SIGN O - ORIYA SIGN VIRAMA */ + { 0x00B55, 0x00B57 }, /* ORIYA SIGN OVERLINE - ORIYA AU LENGTH MARK */ + { 0x00B62, 0x00B63 }, /* ORIYA VOWEL SIGN VOCALIC L - ORIYA VOWEL SIGN VOCALIC LL */ + { 0x00B82, 0x00B82 }, /* TAMIL SIGN ANUSVARA */ + { 0x00BBE, 0x00BC2 }, /* TAMIL VOWEL SIGN AA - TAMIL VOWEL SIGN UU */ + { 0x00BC6, 0x00BC8 }, /* TAMIL VOWEL SIGN E - TAMIL VOWEL SIGN AI */ + { 0x00BCA, 0x00BCD }, /* TAMIL VOWEL SIGN O - TAMIL SIGN VIRAMA */ + { 0x00BD7, 0x00BD7 }, /* TAMIL AU LENGTH MARK */ + { 0x00C00, 0x00C04 }, /* TELUGU SIGN COMBINING CANDRABINDU ABOVE - TELUGU SIGN COMBINING ANUSVARA ABOVE */ + { 0x00C3C, 0x00C3C }, /* TELUGU SIGN NUKTA */ + { 0x00C3E, 0x00C44 }, /* TELUGU VOWEL SIGN AA - TELUGU VOWEL SIGN VOCALIC RR */ + { 0x00C46, 0x00C48 }, /* TELUGU VOWEL SIGN E - TELUGU VOWEL SIGN AI */ + { 0x00C4A, 0x00C4D }, /* TELUGU VOWEL SIGN O - TELUGU SIGN VIRAMA */ + { 0x00C55, 0x00C56 }, /* TELUGU LENGTH MARK - TELUGU AI LENGTH MARK */ + { 0x00C62, 0x00C63 }, /* TELUGU VOWEL SIGN VOCALIC L - TELUGU VOWEL SIGN VOCALIC LL */ + { 0x00C81, 0x00C83 }, /* KANNADA SIGN CANDRABINDU - KANNADA SIGN VISARGA */ + { 0x00CBC, 0x00CBC }, /* KANNADA SIGN NUKTA */ + { 0x00CBE, 0x00CC4 }, /* KANNADA VOWEL SIGN AA - KANNADA VOWEL SIGN VOCALIC RR */ + { 0x00CC6, 0x00CC8 }, /* KANNADA VOWEL SIGN E - KANNADA VOWEL SIGN AI */ + { 0x00CCA, 0x00CCD }, /* KANNADA VOWEL SIGN O - KANNADA SIGN VIRAMA */ + { 0x00CD5, 0x00CD6 }, /* KANNADA LENGTH MARK - KANNADA AI LENGTH MARK */ + { 0x00CE2, 0x00CE3 }, /* KANNADA VOWEL SIGN VOCALIC L - KANNADA VOWEL SIGN VOCALIC LL */ + { 0x00CF3, 0x00CF3 }, /* KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT */ + { 0x00D00, 0x00D03 }, /* MALAYALAM SIGN COMBINING ANUSVARA ABOVE - MALAYALAM SIGN VISARGA */ + { 0x00D3B, 0x00D3C }, /* MALAYALAM SIGN VERTICAL BAR VIRAMA - MALAYALAM SIGN CIRCULAR VIRAMA */ + { 0x00D3E, 0x00D44 }, /* MALAYALAM VOWEL SIGN AA - MALAYALAM VOWEL SIGN VOCALIC RR */ + { 0x00D46, 0x00D48 }, /* MALAYALAM VOWEL SIGN E - MALAYALAM VOWEL SIGN AI */ + { 0x00D4A, 0x00D4D }, /* MALAYALAM VOWEL SIGN O - MALAYALAM SIGN VIRAMA */ + { 0x00D57, 0x00D57 }, /* MALAYALAM AU LENGTH MARK */ + { 0x00D62, 0x00D63 }, /* MALAYALAM VOWEL SIGN VOCALIC L - MALAYALAM VOWEL SIGN VOCALIC LL */ + { 0x00D81, 0x00D83 }, /* SINHALA SIGN CANDRABINDU - SINHALA SIGN VISARGAYA */ + { 0x00DCA, 0x00DCA }, /* SINHALA SIGN AL-LAKUNA */ + { 0x00DCF, 0x00DD4 }, /* SINHALA VOWEL SIGN AELA-PILLA - SINHALA VOWEL SIGN KETTI PAA-PILLA */ + { 0x00DD6, 0x00DD6 }, /* SINHALA VOWEL SIGN DIGA PAA-PILLA */ + { 0x00DD8, 0x00DDF }, /* SINHALA VOWEL SIGN GAETTA-PILLA - SINHALA VOWEL SIGN GAYANUKITTA */ + { 0x00DF2, 0x00DF3 }, /* SINHALA VOWEL SIGN DIGA GAETTA-PILLA - SINHALA VOWEL SIGN DIGA GAYANUKITTA */ + { 0x00E31, 0x00E31 }, /* THAI CHARACTER MAI HAN-AKAT */ + { 0x00E34, 0x00E3A }, /* THAI CHARACTER SARA I - THAI CHARACTER PHINTHU */ + { 0x00E47, 0x00E4E }, /* THAI CHARACTER MAITAIKHU - THAI CHARACTER YAMAKKAN */ + { 0x00EB1, 0x00EB1 }, /* LAO VOWEL SIGN MAI KAN */ + { 0x00EB4, 0x00EBC }, /* LAO VOWEL SIGN I - LAO SEMIVOWEL SIGN LO */ + { 0x00EC8, 0x00ECE }, /* LAO TONE MAI EK - LAO YAMAKKAN */ + { 0x00F18, 0x00F19 }, /* TIBETAN ASTROLOGICAL SIGN -KHYUD PA - TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS */ + { 0x00F35, 0x00F35 }, /* TIBETAN MARK NGAS BZUNG NYI ZLA */ + { 0x00F37, 0x00F37 }, /* TIBETAN MARK NGAS BZUNG SGOR RTAGS */ + { 0x00F39, 0x00F39 }, /* TIBETAN MARK TSA -PHRU */ + { 0x00F3E, 0x00F3F }, /* TIBETAN SIGN YAR TSHES - TIBETAN SIGN MAR TSHES */ + { 0x00F71, 0x00F84 }, /* TIBETAN VOWEL SIGN AA - TIBETAN MARK HALANTA */ + { 0x00F86, 0x00F87 }, /* TIBETAN SIGN LCI RTAGS - TIBETAN SIGN YANG RTAGS */ + { 0x00F8D, 0x00F97 }, /* TIBETAN SUBJOINED SIGN LCE TSA CAN - TIBETAN SUBJOINED LETTER JA */ + { 0x00F99, 0x00FBC }, /* TIBETAN SUBJOINED LETTER NYA - TIBETAN SUBJOINED LETTER FIXED-FORM RA */ + { 0x00FC6, 0x00FC6 }, /* TIBETAN SYMBOL PADMA GDAN */ + { 0x0102B, 0x0103E }, /* MYANMAR VOWEL SIGN TALL AA - MYANMAR CONSONANT SIGN MEDIAL HA */ + { 0x01056, 0x01059 }, /* MYANMAR VOWEL SIGN VOCALIC R - MYANMAR VOWEL SIGN VOCALIC LL */ + { 0x0105E, 0x01060 }, /* MYANMAR CONSONANT SIGN MON MEDIAL NA - MYANMAR CONSONANT SIGN MON MEDIAL LA */ + { 0x01062, 0x01064 }, /* MYANMAR VOWEL SIGN SGAW KAREN EU - MYANMAR TONE MARK SGAW KAREN KE PHO */ + { 0x01067, 0x0106D }, /* MYANMAR VOWEL SIGN WESTERN PWO KAREN EU - MYANMAR SIGN WESTERN PWO KAREN TONE-5 */ + { 0x01071, 0x01074 }, /* MYANMAR VOWEL SIGN GEBA KAREN I - MYANMAR VOWEL SIGN KAYAH EE */ + { 0x01082, 0x0108D }, /* MYANMAR CONSONANT SIGN SHAN MEDIAL WA - MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE */ + { 0x0108F, 0x0108F }, /* MYANMAR SIGN RUMAI PALAUNG TONE-5 */ + { 0x0109A, 0x0109D }, /* MYANMAR SIGN KHAMTI TONE-1 - MYANMAR VOWEL SIGN AITON AI */ + { 0x0135D, 0x0135F }, /* ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK - ETHIOPIC COMBINING GEMINATION MARK */ + { 0x01712, 0x01715 }, /* TAGALOG VOWEL SIGN I - TAGALOG SIGN PAMUDPOD */ + { 0x01732, 0x01734 }, /* HANUNOO VOWEL SIGN I - HANUNOO SIGN PAMUDPOD */ + { 0x01752, 0x01753 }, /* BUHID VOWEL SIGN I - BUHID VOWEL SIGN U */ + { 0x01772, 0x01773 }, /* TAGBANWA VOWEL SIGN I - TAGBANWA VOWEL SIGN U */ + { 0x017B4, 0x017D3 }, /* KHMER VOWEL INHERENT AQ - KHMER SIGN BATHAMASAT */ + { 0x017DD, 0x017DD }, /* KHMER SIGN ATTHACAN */ + { 0x0180B, 0x0180F }, /* MONGOLIAN FREE VARIATION SELECTOR ONE - MONGOLIAN FREE VARIATION SELECTOR FOUR */ + { 0x01885, 0x01886 }, /* MONGOLIAN LETTER ALI GALI BALUDA - MONGOLIAN LETTER ALI GALI THREE BALUDA */ + { 0x018A9, 0x018A9 }, /* MONGOLIAN LETTER ALI GALI DAGALGA */ + { 0x01920, 0x0192B }, /* LIMBU VOWEL SIGN A - LIMBU SUBJOINED LETTER WA */ + { 0x01930, 0x0193B }, /* LIMBU SMALL LETTER KA - LIMBU SIGN SA-I */ + { 0x01A17, 0x01A1B }, /* BUGINESE VOWEL SIGN I - BUGINESE VOWEL SIGN AE */ + { 0x01A55, 0x01A5E }, /* TAI THAM CONSONANT SIGN MEDIAL RA - TAI THAM CONSONANT SIGN SA */ + { 0x01A60, 0x01A7C }, /* TAI THAM SIGN SAKOT - TAI THAM SIGN KHUEN-LUE KARAN */ + { 0x01A7F, 0x01A7F }, /* TAI THAM COMBINING CRYPTOGRAMMIC DOT */ + { 0x01AB0, 0x01ACE }, /* COMBINING DOUBLED CIRCUMFLEX ACCENT - COMBINING LATIN SMALL LETTER INSULAR T */ + { 0x01B00, 0x01B04 }, /* BALINESE SIGN ULU RICEM - BALINESE SIGN BISAH */ + { 0x01B34, 0x01B44 }, /* BALINESE SIGN REREKAN - BALINESE ADEG ADEG */ + { 0x01B6B, 0x01B73 }, /* BALINESE MUSICAL SYMBOL COMBINING TEGEH - BALINESE MUSICAL SYMBOL COMBINING GONG */ + { 0x01B80, 0x01B82 }, /* SUNDANESE SIGN PANYECEK - SUNDANESE SIGN PANGWISAD */ + { 0x01BA1, 0x01BAD }, /* SUNDANESE CONSONANT SIGN PAMINGKAL - SUNDANESE CONSONANT SIGN PASANGAN WA */ + { 0x01BE6, 0x01BF3 }, /* BATAK SIGN TOMPI - BATAK PANONGONAN */ + { 0x01C24, 0x01C37 }, /* LEPCHA SUBJOINED LETTER YA - LEPCHA SIGN NUKTA */ + { 0x01CD0, 0x01CD2 }, /* VEDIC TONE KARSHANA - VEDIC TONE PRENKHA */ + { 0x01CD4, 0x01CE8 }, /* VEDIC SIGN YAJURVEDIC MIDLINE SVARITA - VEDIC SIGN VISARGA ANUDATTA WITH TAIL */ + { 0x01CED, 0x01CED }, /* VEDIC SIGN TIRYAK */ + { 0x01CF4, 0x01CF4 }, /* VEDIC TONE CANDRA ABOVE */ + { 0x01CF7, 0x01CF9 }, /* VEDIC SIGN ATIKRAMA - VEDIC TONE DOUBLE RING ABOVE */ + { 0x01DC0, 0x01DFF }, /* COMBINING DOTTED GRAVE ACCENT - COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW */ + { 0x0200B, 0x0200F }, /* ZERO WIDTH SPACE - RIGHT-TO-LEFT MARK */ + { 0x0202A, 0x0202E }, /* LEFT-TO-RIGHT EMBEDDING - RIGHT-TO-LEFT OVERRIDE */ + { 0x02060, 0x02064 }, /* WORD JOINER - INVISIBLE PLUS */ + { 0x02066, 0x0206F }, /* LEFT-TO-RIGHT ISOLATE - NOMINAL DIGIT SHAPES */ + { 0x020D0, 0x020F0 }, /* COMBINING LEFT HARPOON ABOVE - COMBINING ASTERISK ABOVE */ + { 0x02640, 0x02640 }, /* FEMALE SIGN */ + { 0x02642, 0x02642 }, /* MALE SIGN */ + { 0x026A7, 0x026A7 }, /* MALE WITH STROKE AND MALE AND FEMALE SIGN */ + { 0x02CEF, 0x02CF1 }, /* COPTIC COMBINING NI ABOVE - COPTIC COMBINING SPIRITUS LENIS */ + { 0x02D7F, 0x02D7F }, /* TIFINAGH CONSONANT JOINER */ + { 0x02DE0, 0x02DFF }, /* COMBINING CYRILLIC LETTER BE - COMBINING CYRILLIC LETTER IOTIFIED BIG YUS */ + { 0x0302A, 0x0302F }, /* IDEOGRAPHIC LEVEL TONE MARK - HANGUL DOUBLE DOT TONE MARK */ + { 0x03099, 0x0309A }, /* COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK - COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK */ + { 0x0A66F, 0x0A672 }, /* COMBINING CYRILLIC VZMET - COMBINING CYRILLIC THOUSAND MILLIONS SIGN */ + { 0x0A674, 0x0A67D }, /* COMBINING CYRILLIC LETTER UKRAINIAN IE - COMBINING CYRILLIC PAYEROK */ + { 0x0A69E, 0x0A69F }, /* COMBINING CYRILLIC LETTER EF - COMBINING CYRILLIC LETTER IOTIFIED E */ + { 0x0A6F0, 0x0A6F1 }, /* BAMUM COMBINING MARK KOQNDON - BAMUM COMBINING MARK TUKWENTIS */ + { 0x0A802, 0x0A802 }, /* SYLOTI NAGRI SIGN DVISVARA */ + { 0x0A806, 0x0A806 }, /* SYLOTI NAGRI SIGN HASANTA */ + { 0x0A80B, 0x0A80B }, /* SYLOTI NAGRI SIGN ANUSVARA */ + { 0x0A823, 0x0A827 }, /* SYLOTI NAGRI VOWEL SIGN A - SYLOTI NAGRI VOWEL SIGN OO */ + { 0x0A82C, 0x0A82C }, /* SYLOTI NAGRI SIGN ALTERNATE HASANTA */ + { 0x0A880, 0x0A881 }, /* SAURASHTRA SIGN ANUSVARA - SAURASHTRA SIGN VISARGA */ + { 0x0A8B4, 0x0A8C5 }, /* SAURASHTRA CONSONANT SIGN HAARU - SAURASHTRA SIGN CANDRABINDU */ + { 0x0A8E0, 0x0A8F1 }, /* COMBINING DEVANAGARI DIGIT ZERO - COMBINING DEVANAGARI SIGN AVAGRAHA */ + { 0x0A8FF, 0x0A8FF }, /* DEVANAGARI VOWEL SIGN AY */ + { 0x0A926, 0x0A92D }, /* KAYAH LI VOWEL UE - KAYAH LI TONE CALYA PLOPHU */ + { 0x0A947, 0x0A953 }, /* REJANG VOWEL SIGN I - REJANG VIRAMA */ + { 0x0A980, 0x0A983 }, /* JAVANESE SIGN PANYANGGA - JAVANESE SIGN WIGNYAN */ + { 0x0A9B3, 0x0A9C0 }, /* JAVANESE SIGN CECAK TELU - JAVANESE PANGKON */ + { 0x0A9E5, 0x0A9E5 }, /* MYANMAR SIGN SHAN SAW */ + { 0x0AA29, 0x0AA36 }, /* CHAM VOWEL SIGN AA - CHAM CONSONANT SIGN WA */ + { 0x0AA43, 0x0AA43 }, /* CHAM CONSONANT SIGN FINAL NG */ + { 0x0AA4C, 0x0AA4D }, /* CHAM CONSONANT SIGN FINAL M - CHAM CONSONANT SIGN FINAL H */ + { 0x0AA7B, 0x0AA7D }, /* MYANMAR SIGN PAO KAREN TONE - MYANMAR SIGN TAI LAING TONE-5 */ + { 0x0AAB0, 0x0AAB0 }, /* TAI VIET MAI KANG */ + { 0x0AAB2, 0x0AAB4 }, /* TAI VIET VOWEL I - TAI VIET VOWEL U */ + { 0x0AAB7, 0x0AAB8 }, /* TAI VIET MAI KHIT - TAI VIET VOWEL IA */ + { 0x0AABE, 0x0AABF }, /* TAI VIET VOWEL AM - TAI VIET TONE MAI EK */ + { 0x0AAC1, 0x0AAC1 }, /* TAI VIET TONE MAI THO */ + { 0x0AAEB, 0x0AAEF }, /* MEETEI MAYEK VOWEL SIGN II - MEETEI MAYEK VOWEL SIGN AAU */ + { 0x0AAF5, 0x0AAF6 }, /* MEETEI MAYEK VOWEL SIGN VISARGA - MEETEI MAYEK VIRAMA */ + { 0x0ABE3, 0x0ABEA }, /* MEETEI MAYEK VOWEL SIGN ONAP - MEETEI MAYEK VOWEL SIGN NUNG */ + { 0x0ABEC, 0x0ABED }, /* MEETEI MAYEK LUM IYEK - MEETEI MAYEK APUN IYEK */ + { 0x0FB1E, 0x0FB1E }, /* HEBREW POINT JUDEO-SPANISH VARIKA */ + { 0x0FE00, 0x0FE0F }, /* VARIATION SELECTOR-1 - VARIATION SELECTOR-16 */ + { 0x0FE20, 0x0FE2F }, /* COMBINING LIGATURE LEFT HALF - COMBINING CYRILLIC TITLO RIGHT HALF */ + { 0x0FEFF, 0x0FEFF }, /* ZERO WIDTH NO-BREAK SPACE */ + { 0x0FFF9, 0x0FFFB }, /* INTERLINEAR ANNOTATION ANCHOR - INTERLINEAR ANNOTATION TERMINATOR */ + { 0x101FD, 0x101FD }, /* PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE */ + { 0x102E0, 0x102E0 }, /* COPTIC EPACT THOUSANDS MARK */ + { 0x10376, 0x1037A }, /* COMBINING OLD PERMIC LETTER AN - COMBINING OLD PERMIC LETTER SII */ + { 0x10A01, 0x10A03 }, /* KHAROSHTHI VOWEL SIGN I - KHAROSHTHI VOWEL SIGN VOCALIC R */ + { 0x10A05, 0x10A06 }, /* KHAROSHTHI VOWEL SIGN E - KHAROSHTHI VOWEL SIGN O */ + { 0x10A0C, 0x10A0F }, /* KHAROSHTHI VOWEL LENGTH MARK - KHAROSHTHI SIGN VISARGA */ + { 0x10A38, 0x10A3A }, /* KHAROSHTHI SIGN BAR ABOVE - KHAROSHTHI SIGN DOT BELOW */ + { 0x10A3F, 0x10A3F }, /* KHAROSHTHI VIRAMA */ + { 0x10AE5, 0x10AE6 }, /* MANICHAEAN ABBREVIATION MARK ABOVE - MANICHAEAN ABBREVIATION MARK BELOW */ + { 0x10D24, 0x10D27 }, /* HANIFI ROHINGYA SIGN HARBAHAY - HANIFI ROHINGYA SIGN TASSI */ + { 0x10D69, 0x10D6D }, /* GARAY VOWEL SIGN E - GARAY CONSONANT NASALIZATION MARK */ + { 0x10EAB, 0x10EAC }, /* YEZIDI COMBINING HAMZA MARK - YEZIDI COMBINING MADDA MARK */ + { 0x10EFC, 0x10EFF }, /* ARABIC COMBINING ALEF OVERLAY - ARABIC SMALL LOW WORD MADDA */ + { 0x10F46, 0x10F50 }, /* SOGDIAN COMBINING DOT BELOW - SOGDIAN COMBINING STROKE BELOW */ + { 0x10F82, 0x10F85 }, /* OLD UYGHUR COMBINING DOT ABOVE - OLD UYGHUR COMBINING TWO DOTS BELOW */ + { 0x11000, 0x11002 }, /* BRAHMI SIGN CANDRABINDU - BRAHMI SIGN VISARGA */ + { 0x11038, 0x11046 }, /* BRAHMI VOWEL SIGN AA - BRAHMI VIRAMA */ + { 0x11070, 0x11070 }, /* BRAHMI SIGN OLD TAMIL VIRAMA */ + { 0x11073, 0x11074 }, /* BRAHMI VOWEL SIGN OLD TAMIL SHORT E - BRAHMI VOWEL SIGN OLD TAMIL SHORT O */ + { 0x1107F, 0x11082 }, /* BRAHMI NUMBER JOINER - KAITHI SIGN VISARGA */ + { 0x110B0, 0x110BA }, /* KAITHI VOWEL SIGN AA - KAITHI SIGN NUKTA */ + { 0x110BD, 0x110BD }, /* KAITHI NUMBER SIGN */ + { 0x110C2, 0x110C2 }, /* KAITHI VOWEL SIGN VOCALIC R */ + { 0x110CD, 0x110CD }, /* KAITHI NUMBER SIGN ABOVE */ + { 0x11100, 0x11102 }, /* CHAKMA SIGN CANDRABINDU - CHAKMA SIGN VISARGA */ + { 0x11127, 0x11134 }, /* CHAKMA VOWEL SIGN A - CHAKMA MAAYYAA */ + { 0x11145, 0x11146 }, /* CHAKMA VOWEL SIGN AA - CHAKMA VOWEL SIGN EI */ + { 0x11173, 0x11173 }, /* MAHAJANI SIGN NUKTA */ + { 0x11180, 0x11182 }, /* SHARADA SIGN CANDRABINDU - SHARADA SIGN VISARGA */ + { 0x111B3, 0x111C0 }, /* SHARADA VOWEL SIGN AA - SHARADA SIGN VIRAMA */ + { 0x111C9, 0x111CC }, /* SHARADA SANDHI MARK - SHARADA EXTRA SHORT VOWEL MARK */ + { 0x111CE, 0x111CF }, /* SHARADA VOWEL SIGN PRISHTHAMATRA E - SHARADA SIGN INVERTED CANDRABINDU */ + { 0x1122C, 0x11237 }, /* KHOJKI VOWEL SIGN AA - KHOJKI SIGN SHADDA */ + { 0x1123E, 0x1123E }, /* KHOJKI SIGN SUKUN */ + { 0x11241, 0x11241 }, /* KHOJKI VOWEL SIGN VOCALIC R */ + { 0x112DF, 0x112EA }, /* KHUDAWADI SIGN ANUSVARA - KHUDAWADI SIGN VIRAMA */ + { 0x11300, 0x11303 }, /* GRANTHA SIGN COMBINING ANUSVARA ABOVE - GRANTHA SIGN VISARGA */ + { 0x1133B, 0x1133C }, /* COMBINING BINDU BELOW - GRANTHA SIGN NUKTA */ + { 0x1133E, 0x11344 }, /* GRANTHA VOWEL SIGN AA - GRANTHA VOWEL SIGN VOCALIC RR */ + { 0x11347, 0x11348 }, /* GRANTHA VOWEL SIGN EE - GRANTHA VOWEL SIGN AI */ + { 0x1134B, 0x1134D }, /* GRANTHA VOWEL SIGN OO - GRANTHA SIGN VIRAMA */ + { 0x11357, 0x11357 }, /* GRANTHA AU LENGTH MARK */ + { 0x11362, 0x11363 }, /* GRANTHA VOWEL SIGN VOCALIC L - GRANTHA VOWEL SIGN VOCALIC LL */ + { 0x11366, 0x1136C }, /* COMBINING GRANTHA DIGIT ZERO - COMBINING GRANTHA DIGIT SIX */ + { 0x11370, 0x11374 }, /* COMBINING GRANTHA LETTER A - COMBINING GRANTHA LETTER PA */ + { 0x113B8, 0x113C0 }, /* TULU-TIGALARI VOWEL SIGN AA - TULU-TIGALARI VOWEL SIGN VOCALIC LL */ + { 0x113C2, 0x113C2 }, /* TULU-TIGALARI VOWEL SIGN EE */ + { 0x113C5, 0x113C5 }, /* TULU-TIGALARI VOWEL SIGN AI */ + { 0x113C7, 0x113CA }, /* TULU-TIGALARI VOWEL SIGN OO - TULU-TIGALARI SIGN CANDRA ANUNASIKA */ + { 0x113CC, 0x113D0 }, /* TULU-TIGALARI SIGN ANUSVARA - TULU-TIGALARI CONJOINER */ + { 0x113D2, 0x113D2 }, /* TULU-TIGALARI GEMINATION MARK */ + { 0x113E1, 0x113E2 }, /* TULU-TIGALARI VEDIC TONE SVARITA - TULU-TIGALARI VEDIC TONE ANUDATTA */ + { 0x11435, 0x11446 }, /* NEWA VOWEL SIGN AA - NEWA SIGN NUKTA */ + { 0x1145E, 0x1145E }, /* NEWA SANDHI MARK */ + { 0x114B0, 0x114C3 }, /* TIRHUTA VOWEL SIGN AA - TIRHUTA SIGN NUKTA */ + { 0x115AF, 0x115B5 }, /* SIDDHAM VOWEL SIGN AA - SIDDHAM VOWEL SIGN VOCALIC RR */ + { 0x115B8, 0x115C0 }, /* SIDDHAM VOWEL SIGN E - SIDDHAM SIGN NUKTA */ + { 0x115DC, 0x115DD }, /* SIDDHAM VOWEL SIGN ALTERNATE U - SIDDHAM VOWEL SIGN ALTERNATE UU */ + { 0x11630, 0x11640 }, /* MODI VOWEL SIGN AA - MODI SIGN ARDHACANDRA */ + { 0x116AB, 0x116B7 }, /* TAKRI SIGN ANUSVARA - TAKRI SIGN NUKTA */ + { 0x1171D, 0x1172B }, /* AHOM CONSONANT SIGN MEDIAL LA - AHOM SIGN KILLER */ + { 0x1182C, 0x1183A }, /* DOGRA VOWEL SIGN AA - DOGRA SIGN NUKTA */ + { 0x11930, 0x11935 }, /* DIVES AKURU VOWEL SIGN AA - DIVES AKURU VOWEL SIGN E */ + { 0x11937, 0x11938 }, /* DIVES AKURU VOWEL SIGN AI - DIVES AKURU VOWEL SIGN O */ + { 0x1193B, 0x1193E }, /* DIVES AKURU SIGN ANUSVARA - DIVES AKURU VIRAMA */ + { 0x11940, 0x11940 }, /* DIVES AKURU MEDIAL YA */ + { 0x11942, 0x11943 }, /* DIVES AKURU MEDIAL RA - DIVES AKURU SIGN NUKTA */ + { 0x119D1, 0x119D7 }, /* NANDINAGARI VOWEL SIGN AA - NANDINAGARI VOWEL SIGN VOCALIC RR */ + { 0x119DA, 0x119E0 }, /* NANDINAGARI VOWEL SIGN E - NANDINAGARI SIGN VIRAMA */ + { 0x119E4, 0x119E4 }, /* NANDINAGARI VOWEL SIGN PRISHTHAMATRA E */ + { 0x11A01, 0x11A0A }, /* ZANABAZAR SQUARE VOWEL SIGN I - ZANABAZAR SQUARE VOWEL LENGTH MARK */ + { 0x11A33, 0x11A39 }, /* ZANABAZAR SQUARE FINAL CONSONANT MARK - ZANABAZAR SQUARE SIGN VISARGA */ + { 0x11A3B, 0x11A3E }, /* ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA - ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA */ + { 0x11A47, 0x11A47 }, /* ZANABAZAR SQUARE SUBJOINER */ + { 0x11A51, 0x11A5B }, /* SOYOMBO VOWEL SIGN I - SOYOMBO VOWEL LENGTH MARK */ + { 0x11A8A, 0x11A99 }, /* SOYOMBO FINAL CONSONANT SIGN G - SOYOMBO SUBJOINER */ + { 0x11C2F, 0x11C36 }, /* BHAIKSUKI VOWEL SIGN AA - BHAIKSUKI VOWEL SIGN VOCALIC L */ + { 0x11C38, 0x11C3F }, /* BHAIKSUKI VOWEL SIGN E - BHAIKSUKI SIGN VIRAMA */ + { 0x11C92, 0x11CA7 }, /* MARCHEN SUBJOINED LETTER KA - MARCHEN SUBJOINED LETTER ZA */ + { 0x11CA9, 0x11CB6 }, /* MARCHEN SUBJOINED LETTER YA - MARCHEN SIGN CANDRABINDU */ + { 0x11D31, 0x11D36 }, /* MASARAM GONDI VOWEL SIGN AA - MASARAM GONDI VOWEL SIGN VOCALIC R */ + { 0x11D3A, 0x11D3A }, /* MASARAM GONDI VOWEL SIGN E */ + { 0x11D3C, 0x11D3D }, /* MASARAM GONDI VOWEL SIGN AI - MASARAM GONDI VOWEL SIGN O */ + { 0x11D3F, 0x11D45 }, /* MASARAM GONDI VOWEL SIGN AU - MASARAM GONDI VIRAMA */ + { 0x11D47, 0x11D47 }, /* MASARAM GONDI RA-KARA */ + { 0x11D8A, 0x11D8E }, /* GUNJALA GONDI VOWEL SIGN AA - GUNJALA GONDI VOWEL SIGN UU */ + { 0x11D90, 0x11D91 }, /* GUNJALA GONDI VOWEL SIGN EE - GUNJALA GONDI VOWEL SIGN AI */ + { 0x11D93, 0x11D97 }, /* GUNJALA GONDI VOWEL SIGN OO - GUNJALA GONDI VIRAMA */ + { 0x11EF3, 0x11EF6 }, /* MAKASAR VOWEL SIGN I - MAKASAR VOWEL SIGN O */ + { 0x11F00, 0x11F01 }, /* KAWI SIGN CANDRABINDU - KAWI SIGN ANUSVARA */ + { 0x11F03, 0x11F03 }, /* KAWI SIGN VISARGA */ + { 0x11F34, 0x11F3A }, /* KAWI VOWEL SIGN AA - KAWI VOWEL SIGN VOCALIC R */ + { 0x11F3E, 0x11F42 }, /* KAWI VOWEL SIGN E - KAWI CONJOINER */ + { 0x11F5A, 0x11F5A }, /* KAWI SIGN NUKTA */ + { 0x13430, 0x13440 }, /* EGYPTIAN HIEROGLYPH VERTICAL JOINER - EGYPTIAN HIEROGLYPH MIRROR HORIZONTALLY */ + { 0x13447, 0x13455 }, /* EGYPTIAN HIEROGLYPH MODIFIER DAMAGED AT TOP START - EGYPTIAN HIEROGLYPH MODIFIER DAMAGED */ + { 0x1611E, 0x1612F }, /* GURUNG KHEMA VOWEL SIGN AA - GURUNG KHEMA SIGN THOLHOMA */ + { 0x16AF0, 0x16AF4 }, /* BASSA VAH COMBINING HIGH TONE - BASSA VAH COMBINING HIGH-LOW TONE */ + { 0x16B30, 0x16B36 }, /* PAHAWH HMONG MARK CIM TUB - PAHAWH HMONG MARK CIM TAUM */ + { 0x16F4F, 0x16F4F }, /* MIAO SIGN CONSONANT MODIFIER BAR */ + { 0x16F51, 0x16F87 }, /* MIAO SIGN ASPIRATION - MIAO VOWEL SIGN UI */ + { 0x16F8F, 0x16F92 }, /* MIAO TONE RIGHT - MIAO TONE BELOW */ + { 0x16FE4, 0x16FE4 }, /* KHITAN SMALL SCRIPT FILLER */ + { 0x16FF0, 0x16FF1 }, /* VIETNAMESE ALTERNATE READING MARK CA - VIETNAMESE ALTERNATE READING MARK NHAY */ + { 0x1BC9D, 0x1BC9E }, /* DUPLOYAN THICK LETTER SELECTOR - DUPLOYAN DOUBLE MARK */ + { 0x1BCA0, 0x1BCA3 }, /* SHORTHAND FORMAT LETTER OVERLAP - SHORTHAND FORMAT UP STEP */ + { 0x1CF00, 0x1CF2D }, /* ZNAMENNY COMBINING MARK GORAZDO NIZKO S KRYZHEM ON LEFT - ZNAMENNY COMBINING MARK KRYZH ON LEFT */ + { 0x1CF30, 0x1CF46 }, /* ZNAMENNY COMBINING TONAL RANGE MARK MRACHNO - ZNAMENNY PRIZNAK MODIFIER ROG */ + { 0x1D165, 0x1D169 }, /* MUSICAL SYMBOL COMBINING STEM - MUSICAL SYMBOL COMBINING TREMOLO-3 */ + { 0x1D16D, 0x1D182 }, /* MUSICAL SYMBOL COMBINING AUGMENTATION DOT - MUSICAL SYMBOL COMBINING LOURE */ + { 0x1D185, 0x1D18B }, /* MUSICAL SYMBOL COMBINING DOIT - MUSICAL SYMBOL COMBINING TRIPLE TONGUE */ + { 0x1D1AA, 0x1D1AD }, /* MUSICAL SYMBOL COMBINING DOWN BOW - MUSICAL SYMBOL COMBINING SNAP PIZZICATO */ + { 0x1D242, 0x1D244 }, /* COMBINING GREEK MUSICAL TRISEME - COMBINING GREEK MUSICAL PENTASEME */ + { 0x1DA00, 0x1DA36 }, /* SIGNWRITING HEAD RIM - SIGNWRITING AIR SUCKING IN */ + { 0x1DA3B, 0x1DA6C }, /* SIGNWRITING MOUTH CLOSED NEUTRAL - SIGNWRITING EXCITEMENT */ + { 0x1DA75, 0x1DA75 }, /* SIGNWRITING UPPER BODY TILTING FROM HIP JOINTS */ + { 0x1DA84, 0x1DA84 }, /* SIGNWRITING LOCATION HEAD NECK */ + { 0x1DA9B, 0x1DA9F }, /* SIGNWRITING FILL MODIFIER-2 - SIGNWRITING FILL MODIFIER-6 */ + { 0x1DAA1, 0x1DAAF }, /* SIGNWRITING ROTATION MODIFIER-2 - SIGNWRITING ROTATION MODIFIER-16 */ + { 0x1E000, 0x1E006 }, /* COMBINING GLAGOLITIC LETTER AZU - COMBINING GLAGOLITIC LETTER ZHIVETE */ + { 0x1E008, 0x1E018 }, /* COMBINING GLAGOLITIC LETTER ZEMLJA - COMBINING GLAGOLITIC LETTER HERU */ + { 0x1E01B, 0x1E021 }, /* COMBINING GLAGOLITIC LETTER SHTA - COMBINING GLAGOLITIC LETTER YATI */ + { 0x1E023, 0x1E024 }, /* COMBINING GLAGOLITIC LETTER YU - COMBINING GLAGOLITIC LETTER SMALL YUS */ + { 0x1E026, 0x1E02A }, /* COMBINING GLAGOLITIC LETTER YO - COMBINING GLAGOLITIC LETTER FITA */ + { 0x1E08F, 0x1E08F }, /* COMBINING CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I */ + { 0x1E130, 0x1E136 }, /* NYIAKENG PUACHUE HMONG TONE-B - NYIAKENG PUACHUE HMONG TONE-D */ + { 0x1E2AE, 0x1E2AE }, /* TOTO SIGN RISING TONE */ + { 0x1E2EC, 0x1E2EF }, /* WANCHO TONE TUP - WANCHO TONE KOINI */ + { 0x1E4EC, 0x1E4EF }, /* NAG MUNDARI SIGN MUHOR - NAG MUNDARI SIGN SUTUH */ + { 0x1E5EE, 0x1E5EF }, /* OL ONAL SIGN MU - OL ONAL SIGN IKIR */ + { 0x1E8D0, 0x1E8D6 }, /* MENDE KIKAKUI COMBINING NUMBER TEENS - MENDE KIKAKUI COMBINING NUMBER MILLIONS */ + { 0x1E944, 0x1E94A }, /* ADLAM ALIF LENGTHENER - ADLAM NUKTA */ + { 0x1F3FB, 0x1F3FF }, /* EMOJI MODIFIER FITZPATRICK TYPE-1-2 - EMOJI MODIFIER FITZPATRICK TYPE-6 */ + { 0x1F9B0, 0x1F9B3 }, /* EMOJI COMPONENT RED HAIR - EMOJI COMPONENT WHITE HAIR */ + { 0xE0001, 0xE0001 }, /* LANGUAGE TAG */ + { 0xE0020, 0xE007F }, /* TAG SPACE - CANCEL TAG */ + { 0xE0100, 0xE01EF }, /* VARIATION SELECTOR-17 - VARIATION SELECTOR-256 */ +}; + +/* Double-width character ranges */ +static const struct ucs_interval ucs_double_width_ranges[] = { + { 0x01100, 0x0115F }, /* HANGUL CHOSEONG KIYEOK - HANGUL CHOSEONG FILLER */ + { 0x0231A, 0x0231B }, /* WATCH - HOURGLASS */ + { 0x02329, 0x0232A }, /* LEFT-POINTING ANGLE BRACKET - RIGHT-POINTING ANGLE BRACKET */ + { 0x023E9, 0x023EC }, /* BLACK RIGHT-POINTING DOUBLE TRIANGLE - BLACK DOWN-POINTING DOUBLE TRIANGLE */ + { 0x023F0, 0x023F0 }, /* ALARM CLOCK */ + { 0x023F3, 0x023F3 }, /* HOURGLASS WITH FLOWING SAND */ + { 0x025FD, 0x025FE }, /* WHITE MEDIUM SMALL SQUARE - BLACK MEDIUM SMALL SQUARE */ + { 0x02614, 0x02615 }, /* UMBRELLA WITH RAIN DROPS - HOT BEVERAGE */ + { 0x02630, 0x02637 }, /* TRIGRAM FOR HEAVEN - TRIGRAM FOR EARTH */ + { 0x02648, 0x02653 }, /* ARIES - PISCES */ + { 0x0267F, 0x0267F }, /* WHEELCHAIR SYMBOL */ + { 0x0268A, 0x0268F }, /* MONOGRAM FOR YANG - DIGRAM FOR GREATER YIN */ + { 0x02693, 0x02693 }, /* ANCHOR */ + { 0x026A1, 0x026A1 }, /* HIGH VOLTAGE SIGN */ + { 0x026AA, 0x026AB }, /* MEDIUM WHITE CIRCLE - MEDIUM BLACK CIRCLE */ + { 0x026BD, 0x026BE }, /* SOCCER BALL - BASEBALL */ + { 0x026C4, 0x026C5 }, /* SNOWMAN WITHOUT SNOW - SUN BEHIND CLOUD */ + { 0x026CE, 0x026CE }, /* OPHIUCHUS */ + { 0x026D4, 0x026D4 }, /* NO ENTRY */ + { 0x026EA, 0x026EA }, /* CHURCH */ + { 0x026F2, 0x026F3 }, /* FOUNTAIN - FLAG IN HOLE */ + { 0x026F5, 0x026F5 }, /* SAILBOAT */ + { 0x026FA, 0x026FA }, /* TENT */ + { 0x026FD, 0x026FD }, /* FUEL PUMP */ + { 0x02705, 0x02705 }, /* WHITE HEAVY CHECK MARK */ + { 0x0270A, 0x0270B }, /* RAISED FIST - RAISED HAND */ + { 0x02728, 0x02728 }, /* SPARKLES */ + { 0x0274C, 0x0274C }, /* CROSS MARK */ + { 0x0274E, 0x0274E }, /* NEGATIVE SQUARED CROSS MARK */ + { 0x02753, 0x02755 }, /* BLACK QUESTION MARK ORNAMENT - WHITE EXCLAMATION MARK ORNAMENT */ + { 0x02757, 0x02757 }, /* HEAVY EXCLAMATION MARK SYMBOL */ + { 0x02795, 0x02797 }, /* HEAVY PLUS SIGN - HEAVY DIVISION SIGN */ + { 0x027B0, 0x027B0 }, /* CURLY LOOP */ + { 0x027BF, 0x027BF }, /* DOUBLE CURLY LOOP */ + { 0x02B1B, 0x02B1C }, /* BLACK LARGE SQUARE - WHITE LARGE SQUARE */ + { 0x02B50, 0x02B50 }, /* WHITE MEDIUM STAR */ + { 0x02B55, 0x02B55 }, /* HEAVY LARGE CIRCLE */ + { 0x02E80, 0x02E99 }, /* CJK RADICAL REPEAT - CJK RADICAL RAP */ + { 0x02E9B, 0x02EF3 }, /* CJK RADICAL CHOKE - CJK RADICAL C-SIMPLIFIED TURTLE */ + { 0x02F00, 0x02FD5 }, /* KANGXI RADICAL ONE - KANGXI RADICAL FLUTE */ + { 0x02FF0, 0x03029 }, /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT - HANGZHOU NUMERAL NINE */ + { 0x03030, 0x0303E }, /* WAVY DASH - IDEOGRAPHIC VARIATION INDICATOR */ + { 0x03041, 0x03096 }, /* HIRAGANA LETTER SMALL A - HIRAGANA LETTER SMALL KE */ + { 0x0309B, 0x030FF }, /* KATAKANA-HIRAGANA VOICED SOUND MARK - KATAKANA DIGRAPH KOTO */ + { 0x03105, 0x0312F }, /* BOPOMOFO LETTER B - BOPOMOFO LETTER NN */ + { 0x03131, 0x0318E }, /* HANGUL LETTER KIYEOK - HANGUL LETTER ARAEAE */ + { 0x03190, 0x031E5 }, /* IDEOGRAPHIC ANNOTATION LINKING MARK - CJK STROKE SZP */ + { 0x031EF, 0x0321E }, /* IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION - PARENTHESIZED KOREAN CHARACTER O HU */ + { 0x03220, 0x03247 }, /* PARENTHESIZED IDEOGRAPH ONE - CIRCLED IDEOGRAPH KOTO */ + { 0x03250, 0x0A48C }, /* PARTNERSHIP SIGN - YI SYLLABLE YYR */ + { 0x0A490, 0x0A4C6 }, /* YI RADICAL QOT - YI RADICAL KE */ + { 0x0A960, 0x0A97C }, /* HANGUL CHOSEONG TIKEUT-MIEUM - HANGUL CHOSEONG SSANGYEORINHIEUH */ + { 0x0AC00, 0x0D7A3 }, /* HANGUL SYLLABLE GA - HANGUL SYLLABLE HIH */ + { 0x0F900, 0x0FAFF }, /* U+F900 - U+FAFF */ + { 0x0FE10, 0x0FE19 }, /* PRESENTATION FORM FOR VERTICAL COMMA - PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS */ + { 0x0FE30, 0x0FE52 }, /* PRESENTATION FORM FOR VERTICAL TWO DOT LEADER - SMALL FULL STOP */ + { 0x0FE54, 0x0FE66 }, /* SMALL SEMICOLON - SMALL EQUALS SIGN */ + { 0x0FE68, 0x0FE6B }, /* SMALL REVERSE SOLIDUS - SMALL COMMERCIAL AT */ + { 0x0FF01, 0x0FF60 }, /* FULLWIDTH EXCLAMATION MARK - FULLWIDTH RIGHT WHITE PARENTHESIS */ + { 0x0FFE0, 0x0FFE6 }, /* FULLWIDTH CENT SIGN - FULLWIDTH WON SIGN */ + { 0x16FE0, 0x16FE3 }, /* TANGUT ITERATION MARK - OLD CHINESE ITERATION MARK */ + { 0x17000, 0x187F7 }, /* U+17000 - U+187F7 */ + { 0x18800, 0x18CD5 }, /* TANGUT COMPONENT-001 - KHITAN SMALL SCRIPT CHARACTER-18CD5 */ + { 0x18CFF, 0x18D08 }, /* U+18CFF - U+18D08 */ + { 0x1AFF0, 0x1AFF3 }, /* KATAKANA LETTER MINNAN TONE-2 - KATAKANA LETTER MINNAN TONE-5 */ + { 0x1AFF5, 0x1AFFB }, /* KATAKANA LETTER MINNAN TONE-7 - KATAKANA LETTER MINNAN NASALIZED TONE-5 */ + { 0x1AFFD, 0x1AFFE }, /* KATAKANA LETTER MINNAN NASALIZED TONE-7 - KATAKANA LETTER MINNAN NASALIZED TONE-8 */ + { 0x1B000, 0x1B122 }, /* KATAKANA LETTER ARCHAIC E - KATAKANA LETTER ARCHAIC WU */ + { 0x1B132, 0x1B132 }, /* HIRAGANA LETTER SMALL KO */ + { 0x1B150, 0x1B152 }, /* HIRAGANA LETTER SMALL WI - HIRAGANA LETTER SMALL WO */ + { 0x1B155, 0x1B155 }, /* KATAKANA LETTER SMALL KO */ + { 0x1B164, 0x1B167 }, /* KATAKANA LETTER SMALL WI - KATAKANA LETTER SMALL N */ + { 0x1B170, 0x1B2FB }, /* NUSHU CHARACTER-1B170 - NUSHU CHARACTER-1B2FB */ + { 0x1D300, 0x1D356 }, /* MONOGRAM FOR EARTH - TETRAGRAM FOR FOSTERING */ + { 0x1D360, 0x1D376 }, /* COUNTING ROD UNIT DIGIT ONE - IDEOGRAPHIC TALLY MARK FIVE */ + { 0x1F000, 0x1F02F }, /* U+1F000 - U+1F02F */ + { 0x1F0A0, 0x1F0FF }, /* U+1F0A0 - U+1F0FF */ + { 0x1F18E, 0x1F18E }, /* NEGATIVE SQUARED AB */ + { 0x1F191, 0x1F19A }, /* SQUARED CL - SQUARED VS */ + { 0x1F200, 0x1F202 }, /* SQUARE HIRAGANA HOKA - SQUARED KATAKANA SA */ + { 0x1F210, 0x1F23B }, /* SQUARED CJK UNIFIED IDEOGRAPH-624B - SQUARED CJK UNIFIED IDEOGRAPH-914D */ + { 0x1F240, 0x1F248 }, /* TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C - TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557 */ + { 0x1F250, 0x1F251 }, /* CIRCLED IDEOGRAPH ADVANTAGE - CIRCLED IDEOGRAPH ACCEPT */ + { 0x1F260, 0x1F265 }, /* ROUNDED SYMBOL FOR FU - ROUNDED SYMBOL FOR CAI */ + { 0x1F300, 0x1F3FA }, /* CYCLONE - AMPHORA */ + { 0x1F400, 0x1F64F }, /* RAT - PERSON WITH FOLDED HANDS */ + { 0x1F680, 0x1F9AF }, /* ROCKET - PROBING CANE */ + { 0x1F9B4, 0x1FAFF }, /* U+1F9B4 - U+1FAFF */ + { 0x20000, 0x2FFFD }, /* U+20000 - U+2FFFD */ + { 0x30000, 0x3FFFD }, /* U+30000 - U+3FFFD */ +}; From patchwork Tue Apr 15 19:17:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882220 Received: from fhigh-a5-smtp.messagingengine.com (fhigh-a5-smtp.messagingengine.com [103.168.172.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB8222459CC; Tue, 15 Apr 2025 19:22:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744942; cv=none; b=CAyA+TxwOollvdpGnpWrvAqb9Ry3H8SPDOhQtXUQYALhPfEaCwuEqebJVf4FEuBq0k2LJsUKS8eunpntaLMURBxywrgNKMhOirX/1LFoDZaoetq+kD7CZMwKLimnwLLSkVQDM9TrJOnzosxxbmWMvnW72LiQ6RtYwBBoKv45Q9g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744942; c=relaxed/simple; bh=7/xbGdXgd44hVPhgMkh8EAfbgYFKxyQauPlAl58TBpY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LxRqwP/in5iEpd5hKiUq2TKqx0Ss/EApcrYD5Hn7dhH8udsDDNI7TEeyNeevDyCcRr4UpxhQB6OBER9QE6TNplVE0l32PN9Mya3zlOimEUllHCeWAfbQlewlQCVZp2dxiXD9sN5f0cIVyvYXQ1WI3O/jFwpDzytv61eY67zJFwo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=rKCmq1ey; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=Q/+82lS8; arc=none smtp.client-ip=103.168.172.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="rKCmq1ey"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Q/+82lS8" Received: from phl-compute-03.internal (phl-compute-03.phl.internal [10.202.2.43]) by mailfhigh.phl.internal (Postfix) with ESMTP id B325511401FB; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-03.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744939; x= 1744831339; bh=LPrIBk7ZKDmr77uKzmMXChvb1JdCEHvs4IMZkPmQtyE=; b=r KCmq1eyV3m3S9uF+fyPj1EvA5ZjeHQ6L40TyO6tGN9jxeevkjyelOXs3kGXBEfHm CnmwoN1wIOkfVQZqELltVVm+KvEHB6LgNFazxV2/emtpOYnH/afXs69+VsXptrYC 0fKU+1nduFZPcBprbdDpnLZN6hjJDiv/hHqTa6vYtuLvGRiNbNnfJvimqZpFeees lf8GQFxeQF+XTVFojFF7Zd7wlt82XsYMC9GbvLNo9ZZT8OEFHg7Z1ft6yGYqYZ7t cG3f2Hs3J28JSqbmDnndbYO/+uvHzf38293NVtE2XvSZ5FJKAh8KIJSgkfUWIsM2 3VCU+NKeKw232/52QSTIQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x=1744831339; bh=L PrIBk7ZKDmr77uKzmMXChvb1JdCEHvs4IMZkPmQtyE=; b=Q/+82lS84hZeif9iI xxqjFPQk/Zm1Jtqw2nuQfd+36Vw35OctMynknVKafvaC8+/yrH08jDJp89lY6qTV p3J46kzc1xKfoEW9eHLwGyoe/StRoQKY8SShv8HtodY0pAEQPPhP9Qb9Ms3drL4c R8ZT9Tf/VkonOXZOz4mD0c3CY279vFYm4V5JeYsqpRCq4FITWWRYji6OK17k1Wca 5sLP8CPVx2neIW8jn1344LsOrhFBCOq70JLo/ryXQEyfq0vu7AGUb1fxZqQ5rc5M G2lu0h6nHQFKdMXQC6NiGTC8dE3zOd9KIhBAP+9tP1CPX96EWDV9/U+NpFsPku+g HeeKw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnhepleektdffjeevfefhiedtudevudetjeejvdej uedtvdevveefuedujeejffetieeknecuffhomhgrihhnpegtrghmrdgrtgdruhhknecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepnhhitghosehf lhhugihnihgtrdhnvghtpdhnsggprhgtphhtthhopeehpdhmohguvgepshhmthhpohhuth dprhgtphhtthhopehnphhithhrvgessggrhihlihgsrhgvrdgtohhmpdhrtghpthhtohep jhhirhhishhlrggshieskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepghhrvghgkhhhse hlihhnuhigfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoheplhhinhhugidqkhgv rhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinhhugidqsh gvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 8FF59111660F; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 06/13] vt: use new tables in ucs.c Date: Tue, 15 Apr 2025 15:17:55 -0400 Message-ID: <20250415192212.33949-7-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre This removes the table from ucs.c and substitutes the generated tables from ucs_width_table.h providing comprehensive ranges for double-width and zero-width Unicode code points. Also implements ucs_is_zero_width() to query the new zero-width table. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/ucs.c | 44 +++++++++++++++++++++----------------- include/linux/consolemap.h | 6 +----- 2 files changed, 25 insertions(+), 25 deletions(-) diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c index 0f6c087158..5e71aa3896 100644 --- a/drivers/tty/vt/ucs.c +++ b/drivers/tty/vt/ucs.c @@ -5,22 +5,12 @@ #include #include -/* ucs_is_double_width() is based on the wcwidth() implementation by - * Markus Kuhn -- 2007-05-26 (Unicode 5.0) - * Latest version: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c - */ - struct ucs_interval { u32 first; u32 last; }; -static const struct ucs_interval ucs_double_width_ranges[] = { - { 0x1100, 0x115F }, { 0x2329, 0x232A }, { 0x2E80, 0x303E }, - { 0x3040, 0xA4CF }, { 0xAC00, 0xD7A3 }, { 0xF900, 0xFAFF }, - { 0xFE10, 0xFE19 }, { 0xFE30, 0xFE6F }, { 0xFF00, 0xFF60 }, - { 0xFFE0, 0xFFE6 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } -}; +#include "ucs_width_table.h" static int interval_cmp(const void *key, const void *element) { @@ -34,6 +24,27 @@ static int interval_cmp(const void *key, const void *element) return 0; } +static bool cp_in_range(u32 cp, const struct ucs_interval *ranges, size_t size) +{ + if (!in_range(cp, ranges[0].first, ranges[size - 1].last)) + return false; + + return __inline_bsearch(&cp, ranges, size, sizeof(*ranges), + interval_cmp) != NULL; +} + +/** + * Determine if a Unicode code point is zero-width. + * + * @param cp: Unicode code point (UCS-4) + * Return: true if the character is zero-width, false otherwise + */ +bool ucs_is_zero_width(u32 cp) +{ + return cp_in_range(cp, ucs_zero_width_ranges, + ARRAY_SIZE(ucs_zero_width_ranges)); +} + /** * Determine if a Unicode code point is double-width. * @@ -42,13 +53,6 @@ static int interval_cmp(const void *key, const void *element) */ bool ucs_is_double_width(u32 cp) { - size_t size = ARRAY_SIZE(ucs_double_width_ranges); - - if (!in_range(cp, ucs_double_width_ranges[0].first, - ucs_double_width_ranges[size - 1].last)) - return false; - - return __inline_bsearch(&cp, ucs_double_width_ranges, size, - sizeof(*ucs_double_width_ranges), - interval_cmp) != NULL; + return cp_in_range(cp, ucs_double_width_ranges, + ARRAY_SIZE(ucs_double_width_ranges)); } diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index 7d778752dc..b3a9118666 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -29,11 +29,7 @@ u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); bool ucs_is_double_width(uint32_t cp); -static inline bool ucs_is_zero_width(uint32_t cp) -{ - /* coming soon */ - return false; -} +bool ucs_is_zero_width(uint32_t cp); #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) From patchwork Tue Apr 15 19:17:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882216 Received: from fhigh-a5-smtp.messagingengine.com (fhigh-a5-smtp.messagingengine.com [103.168.172.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97A052475E3; Tue, 15 Apr 2025 19:22:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744945; cv=none; b=pMPumU8gXrkN0BBNyYO3Kz/4O0AaOSfMRbSDrexPgqQXXh0ycCkNv9+a+Ijxsw8X/jSJ2bQU+Fj/jfUNoGuFhEilhJFCxFuxZ0PbbljQ8nRUUDmel1j/Q5Ox7WXaNWZIUEPwLCqrGEqUaYB8BcAODzlMgPR515VcRU+So5noppQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744945; c=relaxed/simple; bh=5LqN3hI7ixJh6B2H56mzahrf8EEt2sBfK62S9N94zCM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mMp+ZRxA9NZZVjtlWAqWFUlR6Hm0B7bmJ0SICPdILrcCTtgycJX7nPzHnEd0UC2ABh/1H0pm5WFjDvVaXtMj4amNYHKN3Jwdogyn+e5dhXqmm4alwvj1KBiv1rpi5sf8jgfnJ4VYRwER05OdPlDleT4uth4a/VKuUUeN9scz/8s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=iGCv2Usi; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=nM4a1zla; arc=none smtp.client-ip=103.168.172.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="iGCv2Usi"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="nM4a1zla" Received: from phl-compute-05.internal (phl-compute-05.phl.internal [10.202.2.45]) by mailfhigh.phl.internal (Postfix) with ESMTP id C379411400AD; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-05.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1744744939; x=1744831339; bh=Yb5tKtRtrOZEqKIWpRUqh9mXuGAmxCdbqrB1U8/DmPU=; b= iGCv2UsicAkN+3sxggLcrhnCRM5PNXs5FHFuGLcHCbCcLU8P7prCK0/OzYeDa5rX dCCR9wuEDr6tciPflSEqD21lPxh8nUP+ETi/zQyblh+cdsV0Pbap6gjPT8bEkUHK KyuYmCY54suRIuBlX3SaY0IKvz6BOOPllKWlYgYZMzadkJauADYSSPBAL0/1T+Lm R/ZakiAJHJZSUo+Z42XDFuxsJGUlhyCa+nBfhO/6LI3Kno4zrEHhLXdSM7xHSX7q Nimh5fQSovGyMtef1NlV2qC7YGr2mo5jpkyBDQdiIbsaOHmQ7O64uFcZuD43OIZC uGSf6H5y4Q3ixvZpQVSFSg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x= 1744831339; bh=Yb5tKtRtrOZEqKIWpRUqh9mXuGAmxCdbqrB1U8/DmPU=; b=n M4a1zlaqgHNwN4+Rc4Q8Gy14B8kIyODjZ80gaGBSpMVYG5U6Td5cmYtAVPGgp7i/ szqk6DO3H4gBgj+MQDB1Tlr1Apv11T4PaYvmE3SmG/W36yiLazFFy9V0IToEeNjD 23V5JE2uIAgEPVXKFyrncomEHoYMJJOpU3Av9MuYn6019jJjA6RKIEk2YkXc0IVz dGAk5AA9Lun5Jx1tQ7UsizIKixocjPwEFwODcZw/JKqyGCl1cNVdi97meXNuHaFx AKEurfmuqzeTJmf/Gi/GOJOvaYMHHbMpcOLwafnV+93wYxe2CP8oUGKsevUx6aH3 AtefV2byajxMCRaQAdl2w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhggtgfgsehtkeertder tdejnecuhfhrohhmpefpihgtohhlrghsucfrihhtrhgvuceonhhitghosehflhhugihnih gtrdhnvghtqeenucggtffrrghtthgvrhhnpedufedvleehffethffhffelkeekieekgeev ffelleejteehieeghfeluedvvdehieenucevlhhushhtvghrufhiiigvpedtnecurfgrrh grmhepmhgrihhlfhhrohhmpehnihgtohesfhhluhignhhitgdrnhgvthdpnhgspghrtghp thhtohephedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepnhhpihhtrhgvsegsrg ihlhhisghrvgdrtghomhdprhgtphhtthhopehjihhrihhslhgrsgihsehkvghrnhgvlhdr ohhrghdprhgtphhtthhopehgrhgvghhkhheslhhinhhugihfohhunhgurghtihhonhdroh hrghdprhgtphhtthhopehlihhnuhigqdhkvghrnhgvlhesvhhgvghrrdhkvghrnhgvlhdr ohhrghdprhgtphhtthhopehlihhnuhigqdhsvghrihgrlhesvhhgvghrrdhkvghrnhgvlh drohhrgh X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id B3BFA1116610; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 07/13] vt: introduce gen_ucs_recompose_table.py to create ucs_recompose_table.h Date: Tue, 15 Apr 2025 15:17:56 -0400 Message-ID: <20250415192212.33949-8-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre The generated table maps base character + combining mark pairs to their precomposed equivalents using Python's unicodedata module. The default script behavior is to create a table with most commonly used Latin, Greek, and Cyrillic recomposition pairs only. It is much smaller than the table with all possible recomposition pairs (71 entries vs 1000 entries). But if one needs/wants the full table then simply running the script with the --full argument will generate it. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/gen_ucs_recompose_table.py | 255 ++++++++++++++++++++++ 1 file changed, 255 insertions(+) create mode 100755 drivers/tty/vt/gen_ucs_recompose_table.py diff --git a/drivers/tty/vt/gen_ucs_recompose_table.py b/drivers/tty/vt/gen_ucs_recompose_table.py new file mode 100755 index 0000000000..91e81fb1c9 --- /dev/null +++ b/drivers/tty/vt/gen_ucs_recompose_table.py @@ -0,0 +1,255 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# Leverage Python's unicodedata module to generate ucs_recompose_table.h +# +# The generated table maps base character + combining mark pairs to their +# precomposed equivalents. +# +# Usage: +# python gen_ucs_recompose_table.py # Generate with common recomposition pairs +# python gen_ucs_recompose_table.py --full # Generate with all recomposition pairs + +import unicodedata +import sys +import argparse +import textwrap + +# This script's file name +from pathlib import Path +this_file = Path(__file__).name + +# Output file name +out_file = "ucs_recompose_table.h" + +common_recompose_description = "most commonly used Latin, Greek, and Cyrillic recomposition pairs only" +COMMON_RECOMPOSITION_PAIRS = [ + # Latin letters with accents - uppercase + (0x0041, 0x0300, 0x00C0), # A + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER A WITH GRAVE + (0x0041, 0x0301, 0x00C1), # A + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER A WITH ACUTE + (0x0041, 0x0302, 0x00C2), # A + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER A WITH CIRCUMFLEX + (0x0041, 0x0303, 0x00C3), # A + COMBINING TILDE = LATIN CAPITAL LETTER A WITH TILDE + (0x0041, 0x0308, 0x00C4), # A + COMBINING DIAERESIS = LATIN CAPITAL LETTER A WITH DIAERESIS + (0x0041, 0x030A, 0x00C5), # A + COMBINING RING ABOVE = LATIN CAPITAL LETTER A WITH RING ABOVE + (0x0043, 0x0327, 0x00C7), # C + COMBINING CEDILLA = LATIN CAPITAL LETTER C WITH CEDILLA + (0x0045, 0x0300, 0x00C8), # E + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER E WITH GRAVE + (0x0045, 0x0301, 0x00C9), # E + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER E WITH ACUTE + (0x0045, 0x0302, 0x00CA), # E + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER E WITH CIRCUMFLEX + (0x0045, 0x0308, 0x00CB), # E + COMBINING DIAERESIS = LATIN CAPITAL LETTER E WITH DIAERESIS + (0x0049, 0x0300, 0x00CC), # I + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER I WITH GRAVE + (0x0049, 0x0301, 0x00CD), # I + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER I WITH ACUTE + (0x0049, 0x0302, 0x00CE), # I + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER I WITH CIRCUMFLEX + (0x0049, 0x0308, 0x00CF), # I + COMBINING DIAERESIS = LATIN CAPITAL LETTER I WITH DIAERESIS + (0x004E, 0x0303, 0x00D1), # N + COMBINING TILDE = LATIN CAPITAL LETTER N WITH TILDE + (0x004F, 0x0300, 0x00D2), # O + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER O WITH GRAVE + (0x004F, 0x0301, 0x00D3), # O + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER O WITH ACUTE + (0x004F, 0x0302, 0x00D4), # O + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER O WITH CIRCUMFLEX + (0x004F, 0x0303, 0x00D5), # O + COMBINING TILDE = LATIN CAPITAL LETTER O WITH TILDE + (0x004F, 0x0308, 0x00D6), # O + COMBINING DIAERESIS = LATIN CAPITAL LETTER O WITH DIAERESIS + (0x0055, 0x0300, 0x00D9), # U + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER U WITH GRAVE + (0x0055, 0x0301, 0x00DA), # U + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER U WITH ACUTE + (0x0055, 0x0302, 0x00DB), # U + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER U WITH CIRCUMFLEX + (0x0055, 0x0308, 0x00DC), # U + COMBINING DIAERESIS = LATIN CAPITAL LETTER U WITH DIAERESIS + (0x0059, 0x0301, 0x00DD), # Y + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER Y WITH ACUTE + + # Latin letters with accents - lowercase + (0x0061, 0x0300, 0x00E0), # a + COMBINING GRAVE ACCENT = LATIN SMALL LETTER A WITH GRAVE + (0x0061, 0x0301, 0x00E1), # a + COMBINING ACUTE ACCENT = LATIN SMALL LETTER A WITH ACUTE + (0x0061, 0x0302, 0x00E2), # a + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER A WITH CIRCUMFLEX + (0x0061, 0x0303, 0x00E3), # a + COMBINING TILDE = LATIN SMALL LETTER A WITH TILDE + (0x0061, 0x0308, 0x00E4), # a + COMBINING DIAERESIS = LATIN SMALL LETTER A WITH DIAERESIS + (0x0061, 0x030A, 0x00E5), # a + COMBINING RING ABOVE = LATIN SMALL LETTER A WITH RING ABOVE + (0x0063, 0x0327, 0x00E7), # c + COMBINING CEDILLA = LATIN SMALL LETTER C WITH CEDILLA + (0x0065, 0x0300, 0x00E8), # e + COMBINING GRAVE ACCENT = LATIN SMALL LETTER E WITH GRAVE + (0x0065, 0x0301, 0x00E9), # e + COMBINING ACUTE ACCENT = LATIN SMALL LETTER E WITH ACUTE + (0x0065, 0x0302, 0x00EA), # e + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER E WITH CIRCUMFLEX + (0x0065, 0x0308, 0x00EB), # e + COMBINING DIAERESIS = LATIN SMALL LETTER E WITH DIAERESIS + (0x0069, 0x0300, 0x00EC), # i + COMBINING GRAVE ACCENT = LATIN SMALL LETTER I WITH GRAVE + (0x0069, 0x0301, 0x00ED), # i + COMBINING ACUTE ACCENT = LATIN SMALL LETTER I WITH ACUTE + (0x0069, 0x0302, 0x00EE), # i + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER I WITH CIRCUMFLEX + (0x0069, 0x0308, 0x00EF), # i + COMBINING DIAERESIS = LATIN SMALL LETTER I WITH DIAERESIS + (0x006E, 0x0303, 0x00F1), # n + COMBINING TILDE = LATIN SMALL LETTER N WITH TILDE + (0x006F, 0x0300, 0x00F2), # o + COMBINING GRAVE ACCENT = LATIN SMALL LETTER O WITH GRAVE + (0x006F, 0x0301, 0x00F3), # o + COMBINING ACUTE ACCENT = LATIN SMALL LETTER O WITH ACUTE + (0x006F, 0x0302, 0x00F4), # o + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER O WITH CIRCUMFLEX + (0x006F, 0x0303, 0x00F5), # o + COMBINING TILDE = LATIN SMALL LETTER O WITH TILDE + (0x006F, 0x0308, 0x00F6), # o + COMBINING DIAERESIS = LATIN SMALL LETTER O WITH DIAERESIS + (0x0075, 0x0300, 0x00F9), # u + COMBINING GRAVE ACCENT = LATIN SMALL LETTER U WITH GRAVE + (0x0075, 0x0301, 0x00FA), # u + COMBINING ACUTE ACCENT = LATIN SMALL LETTER U WITH ACUTE + (0x0075, 0x0302, 0x00FB), # u + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER U WITH CIRCUMFLEX + (0x0075, 0x0308, 0x00FC), # u + COMBINING DIAERESIS = LATIN SMALL LETTER U WITH DIAERESIS + (0x0079, 0x0301, 0x00FD), # y + COMBINING ACUTE ACCENT = LATIN SMALL LETTER Y WITH ACUTE + (0x0079, 0x0308, 0x00FF), # y + COMBINING DIAERESIS = LATIN SMALL LETTER Y WITH DIAERESIS + + # Common Greek characters + (0x0391, 0x0301, 0x0386), # Α + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ALPHA WITH TONOS + (0x0395, 0x0301, 0x0388), # Ε + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER EPSILON WITH TONOS + (0x0397, 0x0301, 0x0389), # Η + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ETA WITH TONOS + (0x0399, 0x0301, 0x038A), # Ι + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER IOTA WITH TONOS + (0x039F, 0x0301, 0x038C), # Ο + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMICRON WITH TONOS + (0x03A5, 0x0301, 0x038E), # Υ + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER UPSILON WITH TONOS + (0x03A9, 0x0301, 0x038F), # Ω + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMEGA WITH TONOS + (0x03B1, 0x0301, 0x03AC), # α + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ALPHA WITH TONOS + (0x03B5, 0x0301, 0x03AD), # ε + COMBINING ACUTE ACCENT = GREEK SMALL LETTER EPSILON WITH TONOS + (0x03B7, 0x0301, 0x03AE), # η + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ETA WITH TONOS + (0x03B9, 0x0301, 0x03AF), # ι + COMBINING ACUTE ACCENT = GREEK SMALL LETTER IOTA WITH TONOS + (0x03BF, 0x0301, 0x03CC), # ο + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMICRON WITH TONOS + (0x03C5, 0x0301, 0x03CD), # υ + COMBINING ACUTE ACCENT = GREEK SMALL LETTER UPSILON WITH TONOS + (0x03C9, 0x0301, 0x03CE), # ω + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMEGA WITH TONOS + + # Common Cyrillic characters + (0x0418, 0x0306, 0x0419), # И + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT I + (0x0438, 0x0306, 0x0439), # и + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT I + (0x0423, 0x0306, 0x040E), # У + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT U + (0x0443, 0x0306, 0x045E), # у + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT U +] + +full_recompose_description = "all possible recomposition pairs from the Unicode BMP" +def collect_all_recomposition_pairs(): + """Collect all possible recomposition pairs from the Unicode data.""" + # Map to store recomposition pairs: (base, combining) -> recomposed + recompose_map = {} + + # Process all assigned Unicode code points in BMP (Basic Multilingual Plane) + # We limit to BMP (0x0000-0xFFFF) to keep our table smaller with uint16_t + for cp in range(0, 0x10000): + try: + char = chr(cp) + + # Skip unassigned or control characters + if not unicodedata.name(char, ''): + continue + + # Find decomposition + decomp = unicodedata.decomposition(char) + if not decomp or '<' in decomp: # Skip compatibility decompositions + continue + + # Parse the decomposition + parts = decomp.split() + if len(parts) == 2: # Simple base + combining mark + base = int(parts[0], 16) + combining = int(parts[1], 16) + + # Only store if both are in BMP + if base < 0x10000 and combining < 0x10000: + recompose_map[(base, combining)] = cp + + except (ValueError, TypeError): + continue + + # Convert to a list of tuples and sort for binary search + recompose_list = [(base, combining, recomposed) + for (base, combining), recomposed in recompose_map.items()] + recompose_list.sort() + + return recompose_list + +def validate_common_pairs(full_list): + """Validate that all common pairs are in the full list. + + Raises: + ValueError: If any common pair is missing or has a different recomposition + value than what's in the full table. + """ + full_pairs = {(base, combining): recomposed for base, combining, recomposed in full_list} + for base, combining, recomposed in COMMON_RECOMPOSITION_PAIRS: + full_recomposed = full_pairs.get((base, combining)) + if full_recomposed is None: + error_msg = f"Error: Common pair (0x{base:04X}, 0x{combining:04X}) not found in full data" + print(error_msg) + raise ValueError(error_msg) + elif full_recomposed != recomposed: + error_msg = (f"Error: Common pair (0x{base:04X}, 0x{combining:04X}) has different recomposition: " + f"0x{recomposed:04X} vs 0x{full_recomposed:04X}") + print(error_msg) + raise ValueError(error_msg) + +def generate_recomposition_table(use_full_list=False): + """Generate the recomposition C table.""" + + # Collect all recomposition pairs for validation + full_recompose_list = collect_all_recomposition_pairs() + + # Decide which list to use + if use_full_list: + print("Using full recomposition list...") + recompose_list = full_recompose_list + table_description = full_recompose_description + alt_list = COMMON_RECOMPOSITION_PAIRS + alt_description = common_recompose_description + else: + print("Using common recomposition list...") + # Validate that all common pairs are in the full list + validate_common_pairs(full_recompose_list) + recompose_list = sorted(COMMON_RECOMPOSITION_PAIRS) + table_description = common_recompose_description + alt_list = full_recompose_list + alt_description = full_recompose_description + generation_mode = " --full" if use_full_list else "" + alternative_mode = " --full" if not use_full_list else "" + table_description_detail = f"{table_description} ({len(recompose_list)} entries)" + alt_description_detail = f"{alt_description} ({len(alt_list)} entries)" + + # Calculate min/max values for boundary checks + min_base = min(base for base, _, _ in recompose_list) + max_base = max(base for base, _, _ in recompose_list) + min_combining = min(combining for _, combining, _ in recompose_list) + max_combining = max(combining for _, combining, _ in recompose_list) + + # Generate implementation file + with open(out_file, 'w') as f: + f.write(f"""\ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * {out_file} - Unicode character recomposition + * + * Auto-generated by {this_file}{generation_mode} + * + * Unicode Version: {unicodedata.unidata_version} + * +{textwrap.fill( + f"This file contains a table with {table_description_detail}. " + + f"To generate a table with {alt_description_detail} instead, run:", + width=75, initial_indent=" * ", subsequent_indent=" * ")} + * + * python {this_file}{alternative_mode} + */ + +/* + * Table of {table_description} + * Sorted by base character and then combining mark for binary search + */ +static const struct ucs_recomposition ucs_recomposition_table[] = {{ +""") + + for base, combining, recomposed in recompose_list: + try: + base_name = unicodedata.name(chr(base)) + combining_name = unicodedata.name(chr(combining)) + recomposed_name = unicodedata.name(chr(recomposed)) + comment = f"/* {base_name} + {combining_name} = {recomposed_name} */" + except ValueError: + comment = f"/* U+{base:04X} + U+{combining:04X} = U+{recomposed:04X} */" + f.write(f"\t{{ 0x{base:04X}, 0x{combining:04X}, 0x{recomposed:04X} }}, {comment}\n") + + f.write(f"""\ +}}; + +/* + * Boundary values for quick rejection + * These are calculated by analyzing the table during generation + */ +#define UCS_RECOMPOSE_MIN_BASE 0x{min_base:04X} +#define UCS_RECOMPOSE_MAX_BASE 0x{max_base:04X} +#define UCS_RECOMPOSE_MIN_MARK 0x{min_combining:04X} +#define UCS_RECOMPOSE_MAX_MARK 0x{max_combining:04X} +""") + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Generate Unicode recomposition table") + parser.add_argument("--full", action="store_true", + help="Generate a full recomposition table (default: common pairs only)") + args = parser.parse_args() + + generate_recomposition_table(use_full_list=args.full) From patchwork Tue Apr 15 19:17:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 881620 Received: from fout-a2-smtp.messagingengine.com (fout-a2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CCF82459FC; Tue, 15 Apr 2025 19:22:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744943; cv=none; b=VjihzedOa9sa0oovQS1iRvBT9pqsEkGgXkpPr4DOy8orcBgc6OSOEOhv3w1PIno32e4ai7JJMLzPHTt/CjzidG2QoRZnGbfT6Kk92MEnfHH9eNYYqEHmydauVCrOX378iJdKq+NWHDZVx9rH0A6oyAghiqNbVJKoEyEGLKMwCoc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744943; c=relaxed/simple; bh=HGr7Q1zpR2bBpJ5tqLs7+/msiotLem1hfrWCtkm4ws4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IXecytB9PfyqRF9yMm0q2pBKwfGlUo5P9ySOXLP4vSvAqCdlq7H0LRrnZ5vDjT9/TB8ff139ovK9v7BkwdITur3bXyWvxKcQjGXv4qmqYSsfYcH+4AAcqSLy2b4y7UMljCZUp0eW7PVhQR7B0D8ifS4SEtoI25+Apf15brBRpoI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=a3WOMgBx; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=wx/oa0nm; arc=none smtp.client-ip=103.168.172.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="a3WOMgBx"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="wx/oa0nm" Received: from phl-compute-11.internal (phl-compute-11.phl.internal [10.202.2.51]) by mailfout.phl.internal (Postfix) with ESMTP id E917C13801E1; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-11.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744939; x= 1744831339; bh=qP2/oK0oCqZwCrIAxKvtMeUzeeFG8u1nZg9XzijsEII=; b=a 3WOMgBxW1t5pqrfuUKAMndadMe26c0iHxWafEoQ+YQPa2OIelAlfDbjtkzQjaW/y N13ALmu0tVN/y7wWQ2b6r3Xcvn4wOEXOlRfnN96btsEgSr3kFO4H8XH4jJl3YfkM mKMah1eavo+Oy5nTrp+wM+ZV+1e1gkpc1fex1+yNl9YW1nUYg65IFuLUa6nGRoir oriIs3a1ZKQeCz1T7vrR4t3UXVgPKongCEaO8Lucpwq17/UwPzhsZ2fD7FWQR+0K ywvoDvr8WREQFLFrkcyWiYIQRtK6cCoWdXJH9vZtFRpWCsz8Y5xtr4HJ/1sZi9Cp BkW/ypJs+uxfBNkqcYCJw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x=1744831339; bh=q P2/oK0oCqZwCrIAxKvtMeUzeeFG8u1nZg9XzijsEII=; b=wx/oa0nmLBtiVzme5 0nMBUT3myqcn2FYqoLJYprt4QL4xXBcbmdXr1N8ALIgejb/3qeVFAX4yvqPRVG0b 0iGgyMuejXph9VGWqq3K6/wrKwhEpzKx+1SZ2+7pOjNdnnaiYXYdluiV0QTnkEDX P/Fo7nFU1o0+2owVphsBz3sQHmZ/Od9alUzVmmm7VRTDG8r5vBGvRZC3hVihaQNQ 5Zhzp/cpUKPLMr8P4tcis6sJV6+gro0Vt8DSk7Tn/GhWdqfFjKty+wGlQeLjASZm MGrLVlpTjJzQiDiLGAVQeHzKBXeD5MYeM5zpXnl299yvX8fPOi+sjhVcwNACqMlo DJeAg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id D402C1116611; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 08/13] vt: create ucs_recompose_table.h with gen_ucs_recompose_table.py Date: Tue, 15 Apr 2025 15:17:57 -0400 Message-ID: <20250415192212.33949-9-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Table of base character + combining mark pairs with their precomposed equivalents. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/ucs_recompose_table.h | 102 +++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 drivers/tty/vt/ucs_recompose_table.h diff --git a/drivers/tty/vt/ucs_recompose_table.h b/drivers/tty/vt/ucs_recompose_table.h new file mode 100644 index 0000000000..bd91edde5d --- /dev/null +++ b/drivers/tty/vt/ucs_recompose_table.h @@ -0,0 +1,102 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * ucs_recompose_table.h - Unicode character recomposition + * + * Auto-generated by gen_ucs_recompose_table.py + * + * Unicode Version: 16.0.0 + * + * This file contains a table with most commonly used Latin, Greek, and + * Cyrillic recomposition pairs only (71 entries). To generate a table with + * all possible recomposition pairs from the Unicode BMP (1000 entries) + * instead, run: + * + * python gen_ucs_recompose_table.py --full + */ + +/* + * Table of most commonly used Latin, Greek, and Cyrillic recomposition pairs only + * Sorted by base character and then combining mark for binary search + */ +static const struct ucs_recomposition ucs_recomposition_table[] = { + { 0x0041, 0x0300, 0x00C0 }, /* LATIN CAPITAL LETTER A + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER A WITH GRAVE */ + { 0x0041, 0x0301, 0x00C1 }, /* LATIN CAPITAL LETTER A + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER A WITH ACUTE */ + { 0x0041, 0x0302, 0x00C2 }, /* LATIN CAPITAL LETTER A + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER A WITH CIRCUMFLEX */ + { 0x0041, 0x0303, 0x00C3 }, /* LATIN CAPITAL LETTER A + COMBINING TILDE = LATIN CAPITAL LETTER A WITH TILDE */ + { 0x0041, 0x0308, 0x00C4 }, /* LATIN CAPITAL LETTER A + COMBINING DIAERESIS = LATIN CAPITAL LETTER A WITH DIAERESIS */ + { 0x0041, 0x030A, 0x00C5 }, /* LATIN CAPITAL LETTER A + COMBINING RING ABOVE = LATIN CAPITAL LETTER A WITH RING ABOVE */ + { 0x0043, 0x0327, 0x00C7 }, /* LATIN CAPITAL LETTER C + COMBINING CEDILLA = LATIN CAPITAL LETTER C WITH CEDILLA */ + { 0x0045, 0x0300, 0x00C8 }, /* LATIN CAPITAL LETTER E + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER E WITH GRAVE */ + { 0x0045, 0x0301, 0x00C9 }, /* LATIN CAPITAL LETTER E + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER E WITH ACUTE */ + { 0x0045, 0x0302, 0x00CA }, /* LATIN CAPITAL LETTER E + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER E WITH CIRCUMFLEX */ + { 0x0045, 0x0308, 0x00CB }, /* LATIN CAPITAL LETTER E + COMBINING DIAERESIS = LATIN CAPITAL LETTER E WITH DIAERESIS */ + { 0x0049, 0x0300, 0x00CC }, /* LATIN CAPITAL LETTER I + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER I WITH GRAVE */ + { 0x0049, 0x0301, 0x00CD }, /* LATIN CAPITAL LETTER I + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER I WITH ACUTE */ + { 0x0049, 0x0302, 0x00CE }, /* LATIN CAPITAL LETTER I + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER I WITH CIRCUMFLEX */ + { 0x0049, 0x0308, 0x00CF }, /* LATIN CAPITAL LETTER I + COMBINING DIAERESIS = LATIN CAPITAL LETTER I WITH DIAERESIS */ + { 0x004E, 0x0303, 0x00D1 }, /* LATIN CAPITAL LETTER N + COMBINING TILDE = LATIN CAPITAL LETTER N WITH TILDE */ + { 0x004F, 0x0300, 0x00D2 }, /* LATIN CAPITAL LETTER O + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER O WITH GRAVE */ + { 0x004F, 0x0301, 0x00D3 }, /* LATIN CAPITAL LETTER O + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER O WITH ACUTE */ + { 0x004F, 0x0302, 0x00D4 }, /* LATIN CAPITAL LETTER O + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER O WITH CIRCUMFLEX */ + { 0x004F, 0x0303, 0x00D5 }, /* LATIN CAPITAL LETTER O + COMBINING TILDE = LATIN CAPITAL LETTER O WITH TILDE */ + { 0x004F, 0x0308, 0x00D6 }, /* LATIN CAPITAL LETTER O + COMBINING DIAERESIS = LATIN CAPITAL LETTER O WITH DIAERESIS */ + { 0x0055, 0x0300, 0x00D9 }, /* LATIN CAPITAL LETTER U + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER U WITH GRAVE */ + { 0x0055, 0x0301, 0x00DA }, /* LATIN CAPITAL LETTER U + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER U WITH ACUTE */ + { 0x0055, 0x0302, 0x00DB }, /* LATIN CAPITAL LETTER U + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER U WITH CIRCUMFLEX */ + { 0x0055, 0x0308, 0x00DC }, /* LATIN CAPITAL LETTER U + COMBINING DIAERESIS = LATIN CAPITAL LETTER U WITH DIAERESIS */ + { 0x0059, 0x0301, 0x00DD }, /* LATIN CAPITAL LETTER Y + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER Y WITH ACUTE */ + { 0x0061, 0x0300, 0x00E0 }, /* LATIN SMALL LETTER A + COMBINING GRAVE ACCENT = LATIN SMALL LETTER A WITH GRAVE */ + { 0x0061, 0x0301, 0x00E1 }, /* LATIN SMALL LETTER A + COMBINING ACUTE ACCENT = LATIN SMALL LETTER A WITH ACUTE */ + { 0x0061, 0x0302, 0x00E2 }, /* LATIN SMALL LETTER A + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER A WITH CIRCUMFLEX */ + { 0x0061, 0x0303, 0x00E3 }, /* LATIN SMALL LETTER A + COMBINING TILDE = LATIN SMALL LETTER A WITH TILDE */ + { 0x0061, 0x0308, 0x00E4 }, /* LATIN SMALL LETTER A + COMBINING DIAERESIS = LATIN SMALL LETTER A WITH DIAERESIS */ + { 0x0061, 0x030A, 0x00E5 }, /* LATIN SMALL LETTER A + COMBINING RING ABOVE = LATIN SMALL LETTER A WITH RING ABOVE */ + { 0x0063, 0x0327, 0x00E7 }, /* LATIN SMALL LETTER C + COMBINING CEDILLA = LATIN SMALL LETTER C WITH CEDILLA */ + { 0x0065, 0x0300, 0x00E8 }, /* LATIN SMALL LETTER E + COMBINING GRAVE ACCENT = LATIN SMALL LETTER E WITH GRAVE */ + { 0x0065, 0x0301, 0x00E9 }, /* LATIN SMALL LETTER E + COMBINING ACUTE ACCENT = LATIN SMALL LETTER E WITH ACUTE */ + { 0x0065, 0x0302, 0x00EA }, /* LATIN SMALL LETTER E + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER E WITH CIRCUMFLEX */ + { 0x0065, 0x0308, 0x00EB }, /* LATIN SMALL LETTER E + COMBINING DIAERESIS = LATIN SMALL LETTER E WITH DIAERESIS */ + { 0x0069, 0x0300, 0x00EC }, /* LATIN SMALL LETTER I + COMBINING GRAVE ACCENT = LATIN SMALL LETTER I WITH GRAVE */ + { 0x0069, 0x0301, 0x00ED }, /* LATIN SMALL LETTER I + COMBINING ACUTE ACCENT = LATIN SMALL LETTER I WITH ACUTE */ + { 0x0069, 0x0302, 0x00EE }, /* LATIN SMALL LETTER I + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER I WITH CIRCUMFLEX */ + { 0x0069, 0x0308, 0x00EF }, /* LATIN SMALL LETTER I + COMBINING DIAERESIS = LATIN SMALL LETTER I WITH DIAERESIS */ + { 0x006E, 0x0303, 0x00F1 }, /* LATIN SMALL LETTER N + COMBINING TILDE = LATIN SMALL LETTER N WITH TILDE */ + { 0x006F, 0x0300, 0x00F2 }, /* LATIN SMALL LETTER O + COMBINING GRAVE ACCENT = LATIN SMALL LETTER O WITH GRAVE */ + { 0x006F, 0x0301, 0x00F3 }, /* LATIN SMALL LETTER O + COMBINING ACUTE ACCENT = LATIN SMALL LETTER O WITH ACUTE */ + { 0x006F, 0x0302, 0x00F4 }, /* LATIN SMALL LETTER O + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER O WITH CIRCUMFLEX */ + { 0x006F, 0x0303, 0x00F5 }, /* LATIN SMALL LETTER O + COMBINING TILDE = LATIN SMALL LETTER O WITH TILDE */ + { 0x006F, 0x0308, 0x00F6 }, /* LATIN SMALL LETTER O + COMBINING DIAERESIS = LATIN SMALL LETTER O WITH DIAERESIS */ + { 0x0075, 0x0300, 0x00F9 }, /* LATIN SMALL LETTER U + COMBINING GRAVE ACCENT = LATIN SMALL LETTER U WITH GRAVE */ + { 0x0075, 0x0301, 0x00FA }, /* LATIN SMALL LETTER U + COMBINING ACUTE ACCENT = LATIN SMALL LETTER U WITH ACUTE */ + { 0x0075, 0x0302, 0x00FB }, /* LATIN SMALL LETTER U + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER U WITH CIRCUMFLEX */ + { 0x0075, 0x0308, 0x00FC }, /* LATIN SMALL LETTER U + COMBINING DIAERESIS = LATIN SMALL LETTER U WITH DIAERESIS */ + { 0x0079, 0x0301, 0x00FD }, /* LATIN SMALL LETTER Y + COMBINING ACUTE ACCENT = LATIN SMALL LETTER Y WITH ACUTE */ + { 0x0079, 0x0308, 0x00FF }, /* LATIN SMALL LETTER Y + COMBINING DIAERESIS = LATIN SMALL LETTER Y WITH DIAERESIS */ + { 0x0391, 0x0301, 0x0386 }, /* GREEK CAPITAL LETTER ALPHA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ALPHA WITH TONOS */ + { 0x0395, 0x0301, 0x0388 }, /* GREEK CAPITAL LETTER EPSILON + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER EPSILON WITH TONOS */ + { 0x0397, 0x0301, 0x0389 }, /* GREEK CAPITAL LETTER ETA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ETA WITH TONOS */ + { 0x0399, 0x0301, 0x038A }, /* GREEK CAPITAL LETTER IOTA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER IOTA WITH TONOS */ + { 0x039F, 0x0301, 0x038C }, /* GREEK CAPITAL LETTER OMICRON + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMICRON WITH TONOS */ + { 0x03A5, 0x0301, 0x038E }, /* GREEK CAPITAL LETTER UPSILON + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER UPSILON WITH TONOS */ + { 0x03A9, 0x0301, 0x038F }, /* GREEK CAPITAL LETTER OMEGA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMEGA WITH TONOS */ + { 0x03B1, 0x0301, 0x03AC }, /* GREEK SMALL LETTER ALPHA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ALPHA WITH TONOS */ + { 0x03B5, 0x0301, 0x03AD }, /* GREEK SMALL LETTER EPSILON + COMBINING ACUTE ACCENT = GREEK SMALL LETTER EPSILON WITH TONOS */ + { 0x03B7, 0x0301, 0x03AE }, /* GREEK SMALL LETTER ETA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ETA WITH TONOS */ + { 0x03B9, 0x0301, 0x03AF }, /* GREEK SMALL LETTER IOTA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER IOTA WITH TONOS */ + { 0x03BF, 0x0301, 0x03CC }, /* GREEK SMALL LETTER OMICRON + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMICRON WITH TONOS */ + { 0x03C5, 0x0301, 0x03CD }, /* GREEK SMALL LETTER UPSILON + COMBINING ACUTE ACCENT = GREEK SMALL LETTER UPSILON WITH TONOS */ + { 0x03C9, 0x0301, 0x03CE }, /* GREEK SMALL LETTER OMEGA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMEGA WITH TONOS */ + { 0x0418, 0x0306, 0x0419 }, /* CYRILLIC CAPITAL LETTER I + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT I */ + { 0x0423, 0x0306, 0x040E }, /* CYRILLIC CAPITAL LETTER U + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT U */ + { 0x0438, 0x0306, 0x0439 }, /* CYRILLIC SMALL LETTER I + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT I */ + { 0x0443, 0x0306, 0x045E }, /* CYRILLIC SMALL LETTER U + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT U */ +}; + +/* + * Boundary values for quick rejection + * These are calculated by analyzing the table during generation + */ +#define UCS_RECOMPOSE_MIN_BASE 0x0041 +#define UCS_RECOMPOSE_MAX_BASE 0x0443 +#define UCS_RECOMPOSE_MIN_MARK 0x0300 +#define UCS_RECOMPOSE_MAX_MARK 0x0327 From patchwork Tue Apr 15 19:17:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 881622 Received: from fout-a2-smtp.messagingengine.com (fout-a2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0ED2C2459FF; Tue, 15 Apr 2025 19:22:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744942; cv=none; b=rjIrroXWwsgbq1u05PbcqXdsRwByT7ntBUlQ/Lvb7RUiv6qgoAA85axKZ1V8GSNVCrlOyyAvduTbNHSsYvRttiiI9uyE1RaAmOIx2bnQsA9EGjs/YhHdswSNSPi2NIAS2Cmh+JJTf1/ygIeey/OW4FGkI3KV/6iA8oQtnqh6uuk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744942; c=relaxed/simple; bh=5vzet0daixK6Zv4nyIFtUBpAIrRvYYu6oy/iAxqi9Ow=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=STyZWqLWKh5Uw0n9Th03ZVbyNvY1Up2/8Iw+uD8OzOrB/rvZxcndOafmuXJpzeF+OPO4uOrpgx3V63ZOOIBnKL/ZnzfTktyovxws5HAfkntX3jty6ovE26jfHaterXiltGdTEDY5tPfPjbHhov+YsiWJoPAKqBApQ1S2ElqEJAg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=KxGSqw81; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=aI848t2r; arc=none smtp.client-ip=103.168.172.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="KxGSqw81"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="aI848t2r" Received: from phl-compute-02.internal (phl-compute-02.phl.internal [10.202.2.42]) by mailfout.phl.internal (Postfix) with ESMTP id EC82B13801E3; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-02.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744939; x= 1744831339; bh=gH4J9klRNhPiPpbUtkscVBihYK1f2oEF5ZwuY51ocAQ=; b=K xGSqw811DbSFe5HYEuMPiBLt2matpQI62rSBnqaIjtQRKkLbUDLltRR0xsMxRG0e tf1NAHMvp/xEkNeoRaCodRY9w2BF9tr9GFg7zwrBWYQIJvhcXpavjxuxo3LkSG8J NVhJjUSKDs2n7e0Yz5oGgGKfrVayeWz7rZ6E6CejwLHF+dHSD0sTRndVdeZdEWT0 QOWoSCgp9U5EfaLVBHRVbtLSE2SfSQGgIkpajsLHX4pO/sqrqfZ0hmSzihrWOv54 CFXZc6gbhI1U4V30q/A/8A+JgcDBpp6FA4hLM+59dvh2Hq7mCgJAe7IeJ8GsMfVk N3FHxDtnc+x0il1mckPxQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x=1744831339; bh=g H4J9klRNhPiPpbUtkscVBihYK1f2oEF5ZwuY51ocAQ=; b=aI848t2rZKaf7JRTt Tcer7oc1sMeqteZm8jsEtYeRPTWgKyMI3LnTn2Dt2tcXPspdRKVY03VXSAcFfVU9 M+pnxHRKqGcIm5qUz9wSXiqQ+LUlWgjnjrj5hKtGefQZcKeYxjSDAYLv7HeRCb6m QJOIffh9vNbqUu9qL+zrJBnwW7+r3IafFZ1qPfzmAEjecIYJUbCgIdgKgGOkCzN/ 3rG888K5HAP5BvAbCo1oV6Xf0Oy5DgXtGvMLRFkkeoXvOq8rP7nEenc3bFpLOFiY exqLpbvlQpLytsHSOffN57IWt5Qn6IC+nSJc256C4GNg11zqpmoNkCIBXBCyxVLC g6XMg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id EB4CD1116612; Tue, 15 Apr 2025 15:22:18 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 09/13] vt: support Unicode recomposition Date: Tue, 15 Apr 2025 15:17:58 -0400 Message-ID: <20250415192212.33949-10-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Try replacing any decomposed Unicode sequence by the corresponding recomposed code point. Code point to glyph correspondance works best after recomposition, and this apply mostly to single-width code points therefore we can't preserve them in their decomposed form anyway. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/ucs.c | 62 ++++++++++++++++++++++++++++++++++++++ drivers/tty/vt/vt.c | 14 +++++++-- include/linux/consolemap.h | 6 ++++ 3 files changed, 79 insertions(+), 3 deletions(-) diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c index 5e71aa3896..07b2bd1714 100644 --- a/drivers/tty/vt/ucs.c +++ b/drivers/tty/vt/ucs.c @@ -56,3 +56,65 @@ bool ucs_is_double_width(u32 cp) return cp_in_range(cp, ucs_double_width_ranges, ARRAY_SIZE(ucs_double_width_ranges)); } + +/* + * Structure for base with combining mark pairs and resulting recompositions. + * Using u16 to save space since all values are within BMP range. + */ +struct ucs_recomposition { + u16 base; /* base character */ + u16 mark; /* combining mark */ + u16 recomposed; /* corresponding recomposed character */ +}; + +#include "ucs_recompose_table.h" + +struct compare_key { + u16 base; + u16 mark; +}; + +static int recomposition_cmp(const void *key, const void *element) +{ + const struct compare_key *search_key = key; + const struct ucs_recomposition *entry = element; + + /* Compare base character first */ + if (search_key->base < entry->base) + return -1; + if (search_key->base > entry->base) + return 1; + + /* Base characters match, now compare combining character */ + if (search_key->mark < entry->mark) + return -1; + if (search_key->mark > entry->mark) + return 1; + + /* Both match */ + return 0; +} + +/** + * Attempt to recompose two Unicode characters into a single character. + * + * @param base: Base Unicode code point (UCS-4) + * @param mark: Combining mark Unicode code point (UCS-4) + * Return: Recomposed Unicode code point, or 0 if no recomposition is possible + */ +u32 ucs_recompose(u32 base, u32 mark) +{ + /* Check if characters are within the range of our table */ + if (!in_range(base, UCS_RECOMPOSE_MIN_BASE, UCS_RECOMPOSE_MAX_BASE) || + !in_range(mark, UCS_RECOMPOSE_MIN_MARK, UCS_RECOMPOSE_MAX_MARK)) + return 0; + + struct compare_key key = { base, mark }; + struct ucs_recomposition *result = + __inline_bsearch(&key, ucs_recomposition_table, + ARRAY_SIZE(ucs_recomposition_table), + sizeof(*ucs_recomposition_table), + recomposition_cmp); + + return result ? result->recomposed : 0; +} diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index a989feffad..76554c2040 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2925,9 +2925,9 @@ static void vc_con_rewind(struct vc_data *vc) #define UCS_VS16 0xfe0f /* Variation Selector 16 */ -static int vc_process_ucs(struct vc_data *vc, int c, int *tc) +static int vc_process_ucs(struct vc_data *vc, int *c, int *tc) { - u32 prev_c, curr_c = c; + u32 prev_c, curr_c = *c; if (ucs_is_double_width(curr_c)) return 2; @@ -2964,6 +2964,14 @@ static int vc_process_ucs(struct vc_data *vc, int c, int *tc) return 1; } + /* try recomposition */ + prev_c = ucs_recompose(prev_c, curr_c); + if (prev_c != 0) { + vc_con_rewind(vc); + *tc = *c = prev_c; + return 1; + } + /* Otherwise zero-width code points are ignored. */ return 0; } @@ -2978,7 +2986,7 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, bool inverse = false; if (vc->vc_utf && !vc->vc_disp_ctrl) { - width = vc_process_ucs(vc, c, &tc); + width = vc_process_ucs(vc, &c, &tc); if (!width) goto out; } diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index b3a9118666..8167494229 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -30,6 +30,7 @@ int conv_uni_to_8bit(u32 uni); void console_map_init(void); bool ucs_is_double_width(uint32_t cp); bool ucs_is_zero_width(uint32_t cp); +u32 ucs_recompose(u32 base, u32 mark); #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) @@ -69,6 +70,11 @@ static inline bool ucs_is_zero_width(uint32_t cp) { return false; } + +static inline u32 ucs_recompose(u32 base, u32 mark) +{ + return 0; +} #endif /* CONFIG_CONSOLE_TRANSLATIONS */ #endif /* __LINUX_CONSOLEMAP_H__ */ From patchwork Tue Apr 15 19:17:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882219 Received: from fout-a2-smtp.messagingengine.com (fout-a2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D63E823D2A4; Tue, 15 Apr 2025 19:22:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744942; cv=none; b=lmndI9s9lYGMf4Cu/bYD/Z7xnAeX0eDga0vk/Co4ifKOW8vXQvhDiBlpwlxaSI7FBPqQCKVjesTSexis4oJYIXJR//FvQasbKcQE8t+JCs03YZWdXPipvI7IsDodGY/xnY8caZuyy/WhXajwuAhAY2Jb8vXkMJQRWpBcto0UmRk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744942; c=relaxed/simple; bh=nm1E1FEVMIfYg7pbvLYIfRTfxicisS5FpQYS0urX5/Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NqE0rhvTJz+vHML9OnVebKWGzVAJPfCXRTCGlop74yrhXF4n5VaNvBZG0qquz/Vl6noQXpDmbbt50n54UERd0O0DgcqWWc1AZ7g3BsofRxb2nHrUcqnu0wflNFoWZPoWS76RdQdHCcobV1it5qgY+d47f6V73H4Pdp6uT4Ew67Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=nWzMEJ6G; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=AR7syuZC; arc=none smtp.client-ip=103.168.172.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="nWzMEJ6G"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="AR7syuZC" Received: from phl-compute-06.internal (phl-compute-06.phl.internal [10.202.2.46]) by mailfout.phl.internal (Postfix) with ESMTP id B982813801DB; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-06.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744939; x= 1744831339; bh=clBP/AOC+t0fTcP1nwoYlcZ2Ogk0dwhdHt+FVl4HuhU=; b=n WzMEJ6GIq/0jujeV4cf14lB0cWoVTk8AQighnIQHd6T0Boy2U5bNtmhhXqRyU6aI zRvN/R1d3Kg4EnZZIyK/GZoBB+0WxKnOxcE7MDCjq77mNuCPsJVUtJpwASd1ol3F B586GgRPuMX7BL2FQwHjlqAA3G5T4Nm/H4VE3ni66dgskLCBdjwBzcS6bkWn5Y3t 8ch3GMv+4hMRVqFVClJeuPCsyTfOdZLibkI6HbUFW97KyR/gcsndqKGRGiOi2xJe MSJCyILCn+tA4ivLLBF7mm7/0F9Pcb9UVg2RlGte/GphXZZhriI4xlsDQf9DZD84 w8UEIeD6L72ACsXjr0dZg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x=1744831339; bh=c lBP/AOC+t0fTcP1nwoYlcZ2Ogk0dwhdHt+FVl4HuhU=; b=AR7syuZCE24ugIg7r crDGxFKTsUm/D/vUY9qkvlC4GRIKipROXgfkL1/HtAOvuXjH0ZEeuArL8b6DhrwG cG4gAdCd6eik/E7GxIY5J5O7g2KGPRERS0YCI3hAMsEU0WCNtH4zZAhVhC7Gs+cA KCCNoSOO5RdbCOXfLBL+Pl9Y4WhceaA7EKHaUhhUBqw6JhjkWx63LYIzaVYMPDTT yprQPbvUM2br2ow5r/hGcfZBghTWOaTPL7s2yCDVZYcX2r+L9VWmjH5DDLgrxIxr TZlvcGYeLJdo9aqwUsBB4PTKcDn5/xgs75/6tvS3mxkg+EW/MTnxpC5oxZj8Pjip Jo9JQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 0F7801116613; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 10/13] vt: pad double-width code points with a zero-width space Date: Tue, 15 Apr 2025 15:17:59 -0400 Message-ID: <20250415192212.33949-11-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre In the Unicode screen buffer, we follow double-width code points with a space to maintain proper column alignment. This, however, creates semantic problems when e.g. using cut and paste. Let's use a better code point for the column padding's purpose i.e. a zero-width space rather than a full space. This way the combination remains with a width of 2. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/vt.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 76554c2040..1bd1878094 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2923,6 +2923,7 @@ static void vc_con_rewind(struct vc_data *vc) vc->vc_need_wrap = 0; } +#define UCS_ZWS 0x200b /* Zero Width Space */ #define UCS_VS16 0xfe0f /* Variation Selector 16 */ static int vc_process_ucs(struct vc_data *vc, int *c, int *tc) @@ -2941,8 +2942,8 @@ static int vc_process_ucs(struct vc_data *vc, int *c, int *tc) /* * Let's merge this zero-width code point with the preceding * double-width code point by replacing the existing - * whitespace padding. To do so we rewind one column and - * pretend this has a width of 1. + * zero-width space padding. To do so we rewind one column + * and pretend this has a width of 1. * We give the legacy display the same initial space padding. */ vc_con_rewind(vc); @@ -3065,7 +3066,11 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, tc = conv_uni_to_pc(vc, ' '); if (tc < 0) tc = ' '; - next_c = ' '; + /* + * Store a zero-width space in the Unicode screen given that + * the previous code point is semantically double width. + */ + next_c = UCS_ZWS; } out: From patchwork Tue Apr 15 19:18:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 881619 Received: from fhigh-a5-smtp.messagingengine.com (fhigh-a5-smtp.messagingengine.com [103.168.172.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E1202459F2; Tue, 15 Apr 2025 19:22:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744945; cv=none; b=XdE+JXl6ik/dPVPn0Lg4430iTLoJRo5Cv5/DpoQKn38x9WqIHxIQXueKNPkabYsRs41rst/xpCLkTE5ftX9IlgR0tNUyPHQlIGh3Blx+5Fh+TIb8xvuEEVe8s0EklFkauiEpSxsviy1jitUDSkUGSdMs75CrsUd+FQhT3mo/pxE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744945; c=relaxed/simple; bh=+kpwGVPI0KH9D5YJh9PcK+QwDScwKiJ3yKe2ok/lfw4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hn2f/7Bc3yckeCl9s48thoqQnc1S4SwJqOjgCW/Vmh/AuNFspoDyhkaJdecc5uTsS1XAs579fR+cHy3OGYvyiCpKbkK8Ie4okxR6Dvf6m8CvOKTZEjE7d9+U6UlMvUMiYhElm7qJ+yD/Z19WZeCrqz9T9MvsVkUhv3r5/AGfbpk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=LPQIT3UP; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=da7z92YM; arc=none smtp.client-ip=103.168.172.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="LPQIT3UP"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="da7z92YM" Received: from phl-compute-10.internal (phl-compute-10.phl.internal [10.202.2.50]) by mailfhigh.phl.internal (Postfix) with ESMTP id E664D11401FE; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-10.internal (MEProxy); Tue, 15 Apr 2025 15:22:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744939; x= 1744831339; bh=Pxfdmu6a1Rq5FX2/mG+2d7QtU5Ii4ZEqu82w8ijjsho=; b=L PQIT3UPDjy0/gEyMYxtYyrgy2vQ/h4qwIRu1GOSngPWZ7t3tLK5GHORHcPyBpfJl f5vrROeJrV4j0q/qjzurBUzRwvQEJ1GUidWExJtb1M2TEOz4rTxdFNlJS+X4BHSY TEloI8xWy7p426QUef/rbdFMvomk+c+ri/x7/qne9aevJnykTrAARpVjpsYxhuCm BroA1tp0JwapLA40YCJ3QEVE5/HuHwS6MVGAHFvS2MCDjdwWCq/TjBf4oH/Ve8RA hkLZldjUCHFBkwIx3ny7kOd56nb67RmWp/aMoEQhl5XkGo3MXig237FbZJEThEdp CppLBZiNQy2O1sbGLT9aw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744939; x=1744831339; bh=P xfdmu6a1Rq5FX2/mG+2d7QtU5Ii4ZEqu82w8ijjsho=; b=da7z92YMAB4v5nGNg QCkQ+60fFNb50VU7EaQYKDPYsJStMVUoM1l7dVfwEPqXF8vPfUoaR+7o4qfJHMI4 wOH70OZ5vkdFO+pylIsXomjY9soye+mpeUhXphYJGiDv81aix6nxM+Vmt9TgFXKq St9WZ4JfDPWrBYDoEKSYILoUvWccvyGiWVQ2LaOCmRuQOUyDROKA4/NqVV9eFbnm 38rIukXhv8kqr20bbzkAcunvBZsOmg/F4AatMuGIb0jYBtoG1HYwwMwiofUF54e6 itAIDWlwFg6/RkQQ+be79Bx6A4B3SSv+3puPHIgeUuViTlmwYrMfJL7e7qwkarSh wJ8kw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 1E1361116617; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 11/13] vt: remove zero-width-space handling from conv_uni_to_pc() Date: Tue, 15 Apr 2025 15:18:00 -0400 Message-ID: <20250415192212.33949-12-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre This is now taken care of by ucs_is_zero_width(). Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/consolemap.c | 2 -- drivers/tty/vt/vt.c | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/tty/vt/consolemap.c b/drivers/tty/vt/consolemap.c index 82d70083fe..bb4bb272eb 100644 --- a/drivers/tty/vt/consolemap.c +++ b/drivers/tty/vt/consolemap.c @@ -870,8 +870,6 @@ int conv_uni_to_pc(struct vc_data *conp, long ucs) return -4; /* Not found */ else if (ucs < 0x20) return -1; /* Not a printable character */ - else if (ucs == 0xfeff || (ucs >= 0x200b && ucs <= 0x200f)) - return -2; /* Zero-width space */ /* * UNI_DIRECT_BASE indicates the start of the region in the User Zone * which always has a 1:1 mapping to the currently loaded font. The diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 1bd1878094..24c6cd2eed 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2995,7 +2995,7 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, /* Now try to find out how to display it */ tc = conv_uni_to_pc(vc, tc); if (tc & ~charmask) { - if (tc == -1 || tc == -2) + if (tc == -1) return -1; /* nothing to display */ /* Glyph not found */ From patchwork Tue Apr 15 19:18:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882218 Received: from fout-a2-smtp.messagingengine.com (fout-a2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08E5A23D2B4; Tue, 15 Apr 2025 19:22:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744943; cv=none; b=srvp2zKpG1JzPxHl4GMAVEV/wTLkyCIH/26knHCaxtzV0IAzUJn2Ba5D87sIbf6CUlhJiFzIz8izFlmVwpDpedEhCXPzkzRcIzxci2ZyJm/dHMjvhQzq90l/z0QN7iSaKLwfqHqiEyM7bcr8NXBr3rG6RVjqsHLPLDlKag7cAfE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744943; c=relaxed/simple; bh=IMopP9qGZnbSLzoykCHwRFQnzkFrus3cxL5DmPnKdko=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PTgbqgoon51ee4ZOMGS1TA9KLRfGIZpw6GxHag4OeI55GMslTtLChZSQbFcIsAxO/adOZEAYqhxdVJhouCzzrpld0sBS9M4yqeZQ082gASQzT4rUrVzro+tjJ5OqrhOwKOm0MEcm5xhDv6oOlf3DZGzkLEljzzyE52aVkUbDPiI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=OS5Bxld/; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=F8Oqh2k5; arc=none smtp.client-ip=103.168.172.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="OS5Bxld/"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="F8Oqh2k5" Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfout.phl.internal (Postfix) with ESMTP id 23E4D13801DF; Tue, 15 Apr 2025 15:22:20 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-01.internal (MEProxy); Tue, 15 Apr 2025 15:22:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744940; x= 1744831340; bh=TOECGAz+1UN5jJO3zU++rVDP+tWki0SmZFEregeb9FQ=; b=O S5Bxld/8Ogp1LHGlQ8mki6TPaOAD0Br06Wx1z9tZhLvs9B41e6afRwLU2MHFnY6k Q4GfsJWPEsEnW8uq65uc7J90S8ymYRVRvBo9H9wiEwHK+GnkSePIXNwTzhBm4Fj7 h9jT5E0nmYjjxtliGChq7WsADxKEeu+I+Gp47fEvJyGistmzQVpfQDjROeXwKz/D qsfvXEBAvt5FdBx4sbJbREv2VBxU3SnIEur9oRA62DNRLr870sbnHtBYJkmS1znp aKbzmIFGY3t/4ncCXrycdE5YeTMwgJH4iN49O69jkK3R5UdtmEdrqEW+Ujv/UHhw OkpZsCwQR9n1ESY8qo2CA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744940; x=1744831340; bh=T OECGAz+1UN5jJO3zU++rVDP+tWki0SmZFEregeb9FQ=; b=F8Oqh2k5Xn7Jd6JZ9 OYB6CirdD99cpvAsRl1NqEXMDswtSOc0lOKCfBq1TtmFksIicXDWaHYmbV4Y4Eu5 QheklPHlSsHUtEnbPOyS/HfQVuPLfKMgg5Ym/7BKJ5Is5fTSFH8q4U9GPgc/SWqQ 7IepNYt9t1NSTUCy4pKo7T3rvHWW5cyeE02TAthFoNXRsDAIN8ujOf2yt/PK3+u8 4z4SNEXPLyhmMqPU84K8zC5ix28WvJZU+npnrODANrGJorGOmU5sHgj2xWaBQkEH ocXtTg56ru21sDcqAHYUrj0xW5xARRAbaSDRNwv4+6Ged2Glh8blRN5UQP6kHHRC c58+A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 345BF1116618; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 12/13] vt: update gen_ucs_width_table.py to make tables more space efficient Date: Tue, 15 Apr 2025 15:18:01 -0400 Message-ID: <20250415192212.33949-13-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Split table ranges into BMP (16-bit) and non-BMP (above 16-bit). This reduces the corresponding text size by 20-25%. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/gen_ucs_width_table.py | 55 ++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/drivers/tty/vt/gen_ucs_width_table.py b/drivers/tty/vt/gen_ucs_width_table.py index 00510444a7..059ed9a8ba 100755 --- a/drivers/tty/vt/gen_ucs_width_table.py +++ b/drivers/tty/vt/gen_ucs_width_table.py @@ -194,6 +194,27 @@ def write_tables(zero_width_ranges, double_width_ranges): double_width_ranges: List of (start, end) ranges for double-width characters """ + # Function to split ranges into BMP (16-bit) and non-BMP (above 16-bit) + def split_ranges_by_size(ranges): + bmp_ranges = [] + non_bmp_ranges = [] + + for start, end in ranges: + if end <= 0xFFFF: + bmp_ranges.append((start, end)) + elif start > 0xFFFF: + non_bmp_ranges.append((start, end)) + else: + # Split the range at 0xFFFF + bmp_ranges.append((start, 0xFFFF)) + non_bmp_ranges.append((0x10000, end)) + + return bmp_ranges, non_bmp_ranges + + # Split ranges into BMP and non-BMP + zero_width_bmp, zero_width_non_bmp = split_ranges_by_size(zero_width_ranges) + double_width_bmp, double_width_non_bmp = split_ranges_by_size(double_width_ranges) + # Function to generate code point description comments def get_code_point_comment(start, end): try: @@ -221,22 +242,44 @@ def write_tables(zero_width_ranges, double_width_ranges): * Unicode Version: {unicodedata.unidata_version} */ -/* Zero-width character ranges */ -static const struct ucs_interval ucs_zero_width_ranges[] = {{ +/* Zero-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct ucs_interval16 ucs_zero_width_bmp_ranges[] = {{ +""") + + for start, end in zero_width_bmp: + comment = get_code_point_comment(start, end) + f.write(f"\t{{ 0x{start:04X}, 0x{end:04X} }}, {comment}\n") + + f.write("""\ +}; + +/* Zero-width character ranges (non-BMP, U+10000 and above) */ +static const struct ucs_interval32 ucs_zero_width_non_bmp_ranges[] = { """) - for start, end in zero_width_ranges: + for start, end in zero_width_non_bmp: comment = get_code_point_comment(start, end) f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") f.write("""\ }; -/* Double-width character ranges */ -static const struct ucs_interval ucs_double_width_ranges[] = { +/* Double-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct ucs_interval16 ucs_double_width_bmp_ranges[] = { +""") + + for start, end in double_width_bmp: + comment = get_code_point_comment(start, end) + f.write(f"\t{{ 0x{start:04X}, 0x{end:04X} }}, {comment}\n") + + f.write("""\ +}; + +/* Double-width character ranges (non-BMP, U+10000 and above) */ +static const struct ucs_interval32 ucs_double_width_non_bmp_ranges[] = { """) - for start, end in double_width_ranges: + for start, end in double_width_non_bmp: comment = get_code_point_comment(start, end) f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") From patchwork Tue Apr 15 19:18:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882215 Received: from fout-a2-smtp.messagingengine.com (fout-a2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9722E2475CE; Tue, 15 Apr 2025 19:22:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744947; cv=none; b=EH+T2pSi0QHhPra8URRJ860lr3Dka8mC/SSNEr0PuuqfNopkKs5a/JCr4YtbZ4LsqNL1FclVC96v6D7xXG47gc4e2VNQCUVboM1hgoNOS4HuIiUxCjUTsKaGHtL2qnJ5sb62TmsdFZrKkYtE2kTOGF2hIYNwJr+x5o2o8i69HbM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744744947; c=relaxed/simple; bh=X31G6GXGjTVQ0L0lEIfE7e2dLNeMoYhkBNA5EU3TlcQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iw/9PlKeDWe6ogQvWRCEoa8girAWoWV6ek5zEj29XZNK9ziXvbL4yeTVsS6Z0IkP5ArrrUPNCkVyvvwn5ELeYByf6Ishj9W7BrglgBydFqFb6qrmOIe/Y/rPRO/2N7wwP/EOQyJNMNkIG0LYgDhMWQ+SMwollS09LFccn2I2oSA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=NMCal55n; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=lfQMzr67; arc=none smtp.client-ip=103.168.172.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="NMCal55n"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="lfQMzr67" Received: from phl-compute-04.internal (phl-compute-04.phl.internal [10.202.2.44]) by mailfout.phl.internal (Postfix) with ESMTP id 4FF0913801E4; Tue, 15 Apr 2025 15:22:20 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-04.internal (MEProxy); Tue, 15 Apr 2025 15:22:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744744940; x= 1744831340; bh=rahU/e7Uy2LshU525SENSTKsU9/vMtdhIIMHgSSt4Ts=; b=N MCal55nXWXikBFhpCrso5HhD6uRfKoM7dZA9wcKailEVESxhXFvvpUHbiFrK4VUw 82JJvCHGTXwBeZOjrojq9J+t1hwNv0XgoEXKdKo46lWQcYuYNJ8XSCOBcF0TcaXY uE5A8YlAwn2QF9J0BT82sfcCesucMbTTbF4yWeDSyaTW2UcQ++3pkShDbpLMGyy2 mDhpBOaZ0Sy6XdKYgY/5G7uTsYNH3ruTmQjZCcuyWt/fknYNyYqnyTsp95oHS2ai rs1RybKiOFO4piutc2qSTLcQdJVS6yfwWlWI68McEtg6UUDukHh/Mi0nzyG3+o+F jLU+0INF3Oo1TjAb5Z+Kg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744744940; x=1744831340; bh=r ahU/e7Uy2LshU525SENSTKsU9/vMtdhIIMHgSSt4Ts=; b=lfQMzr67lq1aoJSns cqHFBscT6VSWYQYW9yE4uHu6AtBJhOBGNN2LMS4rOmxd3ELLU4XG2hk5SJy2baem dogFzH81GCYLEsRGEbNDwZblU0ZZ8yV80qypeonGbc6nFem5o/mTkCkfqBsi6RdJ h4g9oDn932Rz3F9gRw0NPHsZdYeNhmt1MXuLwgewP22r9quUh1xyb6+hRkj8zKLe 7kyAfdIgkhW0TjprTsjURFs+ClTarW5iiuYT9xFYAheCsXK3ldi03zbOzFoHaYgL tI11XLJLZNqg5tA6WXWIN9W8/MZCMCUPedByF83Kyv7poYCbo4iH3QvZPeTTm0+f g3cLg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvvdegfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 54AED1116619; Tue, 15 Apr 2025 15:22:19 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 13/13] vt: refresh ucs_width_table.h and adjust code in ucs.c accordingly Date: Tue, 15 Apr 2025 15:18:02 -0400 Message-ID: <20250415192212.33949-14-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415192212.33949-1-nico@fluxnic.net> References: <20250415192212.33949-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Width tables are now split into BMP (16-bit) and non-BMP (above 16-bit). This reduces the corresponding text size by 20-25%. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/ucs.c | 54 +++- drivers/tty/vt/ucs_width_table.h | 540 ++++++++++++++++--------------- 2 files changed, 319 insertions(+), 275 deletions(-) diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c index 07b2bd1714..710e7750f3 100644 --- a/drivers/tty/vt/ucs.c +++ b/drivers/tty/vt/ucs.c @@ -5,17 +5,34 @@ #include #include -struct ucs_interval { +struct ucs_interval16 { + u16 first; + u16 last; +}; + +struct ucs_interval32 { u32 first; u32 last; }; #include "ucs_width_table.h" -static int interval_cmp(const void *key, const void *element) +static int interval16_cmp(const void *key, const void *element) +{ + u16 cp = *(u16 *)key; + const struct ucs_interval16 *entry = element; + + if (cp < entry->first) + return -1; + if (cp > entry->last) + return 1; + return 0; +} + +static int interval32_cmp(const void *key, const void *element) { u32 cp = *(u32 *)key; - const struct ucs_interval *entry = element; + const struct ucs_interval32 *entry = element; if (cp < entry->first) return -1; @@ -24,15 +41,26 @@ static int interval_cmp(const void *key, const void *element) return 0; } -static bool cp_in_range(u32 cp, const struct ucs_interval *ranges, size_t size) +static bool cp_in_range16(u16 cp, const struct ucs_interval16 *ranges, size_t size) { if (!in_range(cp, ranges[0].first, ranges[size - 1].last)) return false; return __inline_bsearch(&cp, ranges, size, sizeof(*ranges), - interval_cmp) != NULL; + interval16_cmp) != NULL; } +static bool cp_in_range32(u32 cp, const struct ucs_interval32 *ranges, size_t size) +{ + if (!in_range(cp, ranges[0].first, ranges[size - 1].last)) + return false; + + return __inline_bsearch(&cp, ranges, size, sizeof(*ranges), + interval32_cmp) != NULL; +} + +#define UCS_IS_BMP(cp) ((cp) <= 0xffff) + /** * Determine if a Unicode code point is zero-width. * @@ -41,8 +69,12 @@ static bool cp_in_range(u32 cp, const struct ucs_interval *ranges, size_t size) */ bool ucs_is_zero_width(u32 cp) { - return cp_in_range(cp, ucs_zero_width_ranges, - ARRAY_SIZE(ucs_zero_width_ranges)); + if (UCS_IS_BMP(cp)) + return cp_in_range16(cp, ucs_zero_width_bmp_ranges, + ARRAY_SIZE(ucs_zero_width_bmp_ranges)); + else + return cp_in_range32(cp, ucs_zero_width_non_bmp_ranges, + ARRAY_SIZE(ucs_zero_width_non_bmp_ranges)); } /** @@ -53,8 +85,12 @@ bool ucs_is_zero_width(u32 cp) */ bool ucs_is_double_width(u32 cp) { - return cp_in_range(cp, ucs_double_width_ranges, - ARRAY_SIZE(ucs_double_width_ranges)); + if (UCS_IS_BMP(cp)) + return cp_in_range16(cp, ucs_double_width_bmp_ranges, + ARRAY_SIZE(ucs_double_width_bmp_ranges)); + else + return cp_in_range32(cp, ucs_double_width_non_bmp_ranges, + ARRAY_SIZE(ucs_double_width_non_bmp_ranges)); } /* diff --git a/drivers/tty/vt/ucs_width_table.h b/drivers/tty/vt/ucs_width_table.h index 9cc86b5cdf..6fcb8f1d57 100644 --- a/drivers/tty/vt/ucs_width_table.h +++ b/drivers/tty/vt/ucs_width_table.h @@ -7,210 +7,214 @@ * Unicode Version: 16.0.0 */ -/* Zero-width character ranges */ -static const struct ucs_interval ucs_zero_width_ranges[] = { - { 0x000AD, 0x000AD }, /* SOFT HYPHEN */ - { 0x00300, 0x0036F }, /* COMBINING GRAVE ACCENT - COMBINING LATIN SMALL LETTER X */ - { 0x00483, 0x00489 }, /* COMBINING CYRILLIC TITLO - COMBINING CYRILLIC MILLIONS SIGN */ - { 0x00591, 0x005BD }, /* HEBREW ACCENT ETNAHTA - HEBREW POINT METEG */ - { 0x005BF, 0x005BF }, /* HEBREW POINT RAFE */ - { 0x005C1, 0x005C2 }, /* HEBREW POINT SHIN DOT - HEBREW POINT SIN DOT */ - { 0x005C4, 0x005C5 }, /* HEBREW MARK UPPER DOT - HEBREW MARK LOWER DOT */ - { 0x005C7, 0x005C7 }, /* HEBREW POINT QAMATS QATAN */ - { 0x00600, 0x00605 }, /* ARABIC NUMBER SIGN - ARABIC NUMBER MARK ABOVE */ - { 0x00610, 0x0061A }, /* ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM - ARABIC SMALL KASRA */ - { 0x0061C, 0x0061C }, /* ARABIC LETTER MARK */ - { 0x0064B, 0x0065F }, /* ARABIC FATHATAN - ARABIC WAVY HAMZA BELOW */ - { 0x00670, 0x00670 }, /* ARABIC LETTER SUPERSCRIPT ALEF */ - { 0x006D6, 0x006DD }, /* ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA - ARABIC END OF AYAH */ - { 0x006DF, 0x006E4 }, /* ARABIC SMALL HIGH ROUNDED ZERO - ARABIC SMALL HIGH MADDA */ - { 0x006E7, 0x006E8 }, /* ARABIC SMALL HIGH YEH - ARABIC SMALL HIGH NOON */ - { 0x006EA, 0x006ED }, /* ARABIC EMPTY CENTRE LOW STOP - ARABIC SMALL LOW MEEM */ - { 0x0070F, 0x0070F }, /* SYRIAC ABBREVIATION MARK */ - { 0x00711, 0x00711 }, /* SYRIAC LETTER SUPERSCRIPT ALAPH */ - { 0x00730, 0x0074A }, /* SYRIAC PTHAHA ABOVE - SYRIAC BARREKH */ - { 0x007A6, 0x007B0 }, /* THAANA ABAFILI - THAANA SUKUN */ - { 0x007EB, 0x007F3 }, /* NKO COMBINING SHORT HIGH TONE - NKO COMBINING DOUBLE DOT ABOVE */ - { 0x007FD, 0x007FD }, /* NKO DANTAYALAN */ - { 0x00816, 0x00819 }, /* SAMARITAN MARK IN - SAMARITAN MARK DAGESH */ - { 0x0081B, 0x00823 }, /* SAMARITAN MARK EPENTHETIC YUT - SAMARITAN VOWEL SIGN A */ - { 0x00825, 0x00827 }, /* SAMARITAN VOWEL SIGN SHORT A - SAMARITAN VOWEL SIGN U */ - { 0x00829, 0x0082D }, /* SAMARITAN VOWEL SIGN LONG I - SAMARITAN MARK NEQUDAA */ - { 0x00859, 0x0085B }, /* MANDAIC AFFRICATION MARK - MANDAIC GEMINATION MARK */ - { 0x00890, 0x00891 }, /* ARABIC POUND MARK ABOVE - ARABIC PIASTRE MARK ABOVE */ - { 0x00897, 0x0089F }, /* ARABIC PEPET - ARABIC HALF MADDA OVER MADDA */ - { 0x008CA, 0x00903 }, /* ARABIC SMALL HIGH FARSI YEH - DEVANAGARI SIGN VISARGA */ - { 0x0093A, 0x0093C }, /* DEVANAGARI VOWEL SIGN OE - DEVANAGARI SIGN NUKTA */ - { 0x0093E, 0x0094F }, /* DEVANAGARI VOWEL SIGN AA - DEVANAGARI VOWEL SIGN AW */ - { 0x00951, 0x00957 }, /* DEVANAGARI STRESS SIGN UDATTA - DEVANAGARI VOWEL SIGN UUE */ - { 0x00962, 0x00963 }, /* DEVANAGARI VOWEL SIGN VOCALIC L - DEVANAGARI VOWEL SIGN VOCALIC LL */ - { 0x00981, 0x00983 }, /* BENGALI SIGN CANDRABINDU - BENGALI SIGN VISARGA */ - { 0x009BC, 0x009BC }, /* BENGALI SIGN NUKTA */ - { 0x009BE, 0x009C4 }, /* BENGALI VOWEL SIGN AA - BENGALI VOWEL SIGN VOCALIC RR */ - { 0x009C7, 0x009C8 }, /* BENGALI VOWEL SIGN E - BENGALI VOWEL SIGN AI */ - { 0x009CB, 0x009CD }, /* BENGALI VOWEL SIGN O - BENGALI SIGN VIRAMA */ - { 0x009D7, 0x009D7 }, /* BENGALI AU LENGTH MARK */ - { 0x009E2, 0x009E3 }, /* BENGALI VOWEL SIGN VOCALIC L - BENGALI VOWEL SIGN VOCALIC LL */ - { 0x009FE, 0x009FE }, /* BENGALI SANDHI MARK */ - { 0x00A01, 0x00A03 }, /* GURMUKHI SIGN ADAK BINDI - GURMUKHI SIGN VISARGA */ - { 0x00A3C, 0x00A3C }, /* GURMUKHI SIGN NUKTA */ - { 0x00A3E, 0x00A42 }, /* GURMUKHI VOWEL SIGN AA - GURMUKHI VOWEL SIGN UU */ - { 0x00A47, 0x00A48 }, /* GURMUKHI VOWEL SIGN EE - GURMUKHI VOWEL SIGN AI */ - { 0x00A4B, 0x00A4D }, /* GURMUKHI VOWEL SIGN OO - GURMUKHI SIGN VIRAMA */ - { 0x00A51, 0x00A51 }, /* GURMUKHI SIGN UDAAT */ - { 0x00A70, 0x00A71 }, /* GURMUKHI TIPPI - GURMUKHI ADDAK */ - { 0x00A75, 0x00A75 }, /* GURMUKHI SIGN YAKASH */ - { 0x00A81, 0x00A83 }, /* GUJARATI SIGN CANDRABINDU - GUJARATI SIGN VISARGA */ - { 0x00ABC, 0x00ABC }, /* GUJARATI SIGN NUKTA */ - { 0x00ABE, 0x00AC5 }, /* GUJARATI VOWEL SIGN AA - GUJARATI VOWEL SIGN CANDRA E */ - { 0x00AC7, 0x00AC9 }, /* GUJARATI VOWEL SIGN E - GUJARATI VOWEL SIGN CANDRA O */ - { 0x00ACB, 0x00ACD }, /* GUJARATI VOWEL SIGN O - GUJARATI SIGN VIRAMA */ - { 0x00AE2, 0x00AE3 }, /* GUJARATI VOWEL SIGN VOCALIC L - GUJARATI VOWEL SIGN VOCALIC LL */ - { 0x00AFA, 0x00AFF }, /* GUJARATI SIGN SUKUN - GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE */ - { 0x00B01, 0x00B03 }, /* ORIYA SIGN CANDRABINDU - ORIYA SIGN VISARGA */ - { 0x00B3C, 0x00B3C }, /* ORIYA SIGN NUKTA */ - { 0x00B3E, 0x00B44 }, /* ORIYA VOWEL SIGN AA - ORIYA VOWEL SIGN VOCALIC RR */ - { 0x00B47, 0x00B48 }, /* ORIYA VOWEL SIGN E - ORIYA VOWEL SIGN AI */ - { 0x00B4B, 0x00B4D }, /* ORIYA VOWEL SIGN O - ORIYA SIGN VIRAMA */ - { 0x00B55, 0x00B57 }, /* ORIYA SIGN OVERLINE - ORIYA AU LENGTH MARK */ - { 0x00B62, 0x00B63 }, /* ORIYA VOWEL SIGN VOCALIC L - ORIYA VOWEL SIGN VOCALIC LL */ - { 0x00B82, 0x00B82 }, /* TAMIL SIGN ANUSVARA */ - { 0x00BBE, 0x00BC2 }, /* TAMIL VOWEL SIGN AA - TAMIL VOWEL SIGN UU */ - { 0x00BC6, 0x00BC8 }, /* TAMIL VOWEL SIGN E - TAMIL VOWEL SIGN AI */ - { 0x00BCA, 0x00BCD }, /* TAMIL VOWEL SIGN O - TAMIL SIGN VIRAMA */ - { 0x00BD7, 0x00BD7 }, /* TAMIL AU LENGTH MARK */ - { 0x00C00, 0x00C04 }, /* TELUGU SIGN COMBINING CANDRABINDU ABOVE - TELUGU SIGN COMBINING ANUSVARA ABOVE */ - { 0x00C3C, 0x00C3C }, /* TELUGU SIGN NUKTA */ - { 0x00C3E, 0x00C44 }, /* TELUGU VOWEL SIGN AA - TELUGU VOWEL SIGN VOCALIC RR */ - { 0x00C46, 0x00C48 }, /* TELUGU VOWEL SIGN E - TELUGU VOWEL SIGN AI */ - { 0x00C4A, 0x00C4D }, /* TELUGU VOWEL SIGN O - TELUGU SIGN VIRAMA */ - { 0x00C55, 0x00C56 }, /* TELUGU LENGTH MARK - TELUGU AI LENGTH MARK */ - { 0x00C62, 0x00C63 }, /* TELUGU VOWEL SIGN VOCALIC L - TELUGU VOWEL SIGN VOCALIC LL */ - { 0x00C81, 0x00C83 }, /* KANNADA SIGN CANDRABINDU - KANNADA SIGN VISARGA */ - { 0x00CBC, 0x00CBC }, /* KANNADA SIGN NUKTA */ - { 0x00CBE, 0x00CC4 }, /* KANNADA VOWEL SIGN AA - KANNADA VOWEL SIGN VOCALIC RR */ - { 0x00CC6, 0x00CC8 }, /* KANNADA VOWEL SIGN E - KANNADA VOWEL SIGN AI */ - { 0x00CCA, 0x00CCD }, /* KANNADA VOWEL SIGN O - KANNADA SIGN VIRAMA */ - { 0x00CD5, 0x00CD6 }, /* KANNADA LENGTH MARK - KANNADA AI LENGTH MARK */ - { 0x00CE2, 0x00CE3 }, /* KANNADA VOWEL SIGN VOCALIC L - KANNADA VOWEL SIGN VOCALIC LL */ - { 0x00CF3, 0x00CF3 }, /* KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT */ - { 0x00D00, 0x00D03 }, /* MALAYALAM SIGN COMBINING ANUSVARA ABOVE - MALAYALAM SIGN VISARGA */ - { 0x00D3B, 0x00D3C }, /* MALAYALAM SIGN VERTICAL BAR VIRAMA - MALAYALAM SIGN CIRCULAR VIRAMA */ - { 0x00D3E, 0x00D44 }, /* MALAYALAM VOWEL SIGN AA - MALAYALAM VOWEL SIGN VOCALIC RR */ - { 0x00D46, 0x00D48 }, /* MALAYALAM VOWEL SIGN E - MALAYALAM VOWEL SIGN AI */ - { 0x00D4A, 0x00D4D }, /* MALAYALAM VOWEL SIGN O - MALAYALAM SIGN VIRAMA */ - { 0x00D57, 0x00D57 }, /* MALAYALAM AU LENGTH MARK */ - { 0x00D62, 0x00D63 }, /* MALAYALAM VOWEL SIGN VOCALIC L - MALAYALAM VOWEL SIGN VOCALIC LL */ - { 0x00D81, 0x00D83 }, /* SINHALA SIGN CANDRABINDU - SINHALA SIGN VISARGAYA */ - { 0x00DCA, 0x00DCA }, /* SINHALA SIGN AL-LAKUNA */ - { 0x00DCF, 0x00DD4 }, /* SINHALA VOWEL SIGN AELA-PILLA - SINHALA VOWEL SIGN KETTI PAA-PILLA */ - { 0x00DD6, 0x00DD6 }, /* SINHALA VOWEL SIGN DIGA PAA-PILLA */ - { 0x00DD8, 0x00DDF }, /* SINHALA VOWEL SIGN GAETTA-PILLA - SINHALA VOWEL SIGN GAYANUKITTA */ - { 0x00DF2, 0x00DF3 }, /* SINHALA VOWEL SIGN DIGA GAETTA-PILLA - SINHALA VOWEL SIGN DIGA GAYANUKITTA */ - { 0x00E31, 0x00E31 }, /* THAI CHARACTER MAI HAN-AKAT */ - { 0x00E34, 0x00E3A }, /* THAI CHARACTER SARA I - THAI CHARACTER PHINTHU */ - { 0x00E47, 0x00E4E }, /* THAI CHARACTER MAITAIKHU - THAI CHARACTER YAMAKKAN */ - { 0x00EB1, 0x00EB1 }, /* LAO VOWEL SIGN MAI KAN */ - { 0x00EB4, 0x00EBC }, /* LAO VOWEL SIGN I - LAO SEMIVOWEL SIGN LO */ - { 0x00EC8, 0x00ECE }, /* LAO TONE MAI EK - LAO YAMAKKAN */ - { 0x00F18, 0x00F19 }, /* TIBETAN ASTROLOGICAL SIGN -KHYUD PA - TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS */ - { 0x00F35, 0x00F35 }, /* TIBETAN MARK NGAS BZUNG NYI ZLA */ - { 0x00F37, 0x00F37 }, /* TIBETAN MARK NGAS BZUNG SGOR RTAGS */ - { 0x00F39, 0x00F39 }, /* TIBETAN MARK TSA -PHRU */ - { 0x00F3E, 0x00F3F }, /* TIBETAN SIGN YAR TSHES - TIBETAN SIGN MAR TSHES */ - { 0x00F71, 0x00F84 }, /* TIBETAN VOWEL SIGN AA - TIBETAN MARK HALANTA */ - { 0x00F86, 0x00F87 }, /* TIBETAN SIGN LCI RTAGS - TIBETAN SIGN YANG RTAGS */ - { 0x00F8D, 0x00F97 }, /* TIBETAN SUBJOINED SIGN LCE TSA CAN - TIBETAN SUBJOINED LETTER JA */ - { 0x00F99, 0x00FBC }, /* TIBETAN SUBJOINED LETTER NYA - TIBETAN SUBJOINED LETTER FIXED-FORM RA */ - { 0x00FC6, 0x00FC6 }, /* TIBETAN SYMBOL PADMA GDAN */ - { 0x0102B, 0x0103E }, /* MYANMAR VOWEL SIGN TALL AA - MYANMAR CONSONANT SIGN MEDIAL HA */ - { 0x01056, 0x01059 }, /* MYANMAR VOWEL SIGN VOCALIC R - MYANMAR VOWEL SIGN VOCALIC LL */ - { 0x0105E, 0x01060 }, /* MYANMAR CONSONANT SIGN MON MEDIAL NA - MYANMAR CONSONANT SIGN MON MEDIAL LA */ - { 0x01062, 0x01064 }, /* MYANMAR VOWEL SIGN SGAW KAREN EU - MYANMAR TONE MARK SGAW KAREN KE PHO */ - { 0x01067, 0x0106D }, /* MYANMAR VOWEL SIGN WESTERN PWO KAREN EU - MYANMAR SIGN WESTERN PWO KAREN TONE-5 */ - { 0x01071, 0x01074 }, /* MYANMAR VOWEL SIGN GEBA KAREN I - MYANMAR VOWEL SIGN KAYAH EE */ - { 0x01082, 0x0108D }, /* MYANMAR CONSONANT SIGN SHAN MEDIAL WA - MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE */ - { 0x0108F, 0x0108F }, /* MYANMAR SIGN RUMAI PALAUNG TONE-5 */ - { 0x0109A, 0x0109D }, /* MYANMAR SIGN KHAMTI TONE-1 - MYANMAR VOWEL SIGN AITON AI */ - { 0x0135D, 0x0135F }, /* ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK - ETHIOPIC COMBINING GEMINATION MARK */ - { 0x01712, 0x01715 }, /* TAGALOG VOWEL SIGN I - TAGALOG SIGN PAMUDPOD */ - { 0x01732, 0x01734 }, /* HANUNOO VOWEL SIGN I - HANUNOO SIGN PAMUDPOD */ - { 0x01752, 0x01753 }, /* BUHID VOWEL SIGN I - BUHID VOWEL SIGN U */ - { 0x01772, 0x01773 }, /* TAGBANWA VOWEL SIGN I - TAGBANWA VOWEL SIGN U */ - { 0x017B4, 0x017D3 }, /* KHMER VOWEL INHERENT AQ - KHMER SIGN BATHAMASAT */ - { 0x017DD, 0x017DD }, /* KHMER SIGN ATTHACAN */ - { 0x0180B, 0x0180F }, /* MONGOLIAN FREE VARIATION SELECTOR ONE - MONGOLIAN FREE VARIATION SELECTOR FOUR */ - { 0x01885, 0x01886 }, /* MONGOLIAN LETTER ALI GALI BALUDA - MONGOLIAN LETTER ALI GALI THREE BALUDA */ - { 0x018A9, 0x018A9 }, /* MONGOLIAN LETTER ALI GALI DAGALGA */ - { 0x01920, 0x0192B }, /* LIMBU VOWEL SIGN A - LIMBU SUBJOINED LETTER WA */ - { 0x01930, 0x0193B }, /* LIMBU SMALL LETTER KA - LIMBU SIGN SA-I */ - { 0x01A17, 0x01A1B }, /* BUGINESE VOWEL SIGN I - BUGINESE VOWEL SIGN AE */ - { 0x01A55, 0x01A5E }, /* TAI THAM CONSONANT SIGN MEDIAL RA - TAI THAM CONSONANT SIGN SA */ - { 0x01A60, 0x01A7C }, /* TAI THAM SIGN SAKOT - TAI THAM SIGN KHUEN-LUE KARAN */ - { 0x01A7F, 0x01A7F }, /* TAI THAM COMBINING CRYPTOGRAMMIC DOT */ - { 0x01AB0, 0x01ACE }, /* COMBINING DOUBLED CIRCUMFLEX ACCENT - COMBINING LATIN SMALL LETTER INSULAR T */ - { 0x01B00, 0x01B04 }, /* BALINESE SIGN ULU RICEM - BALINESE SIGN BISAH */ - { 0x01B34, 0x01B44 }, /* BALINESE SIGN REREKAN - BALINESE ADEG ADEG */ - { 0x01B6B, 0x01B73 }, /* BALINESE MUSICAL SYMBOL COMBINING TEGEH - BALINESE MUSICAL SYMBOL COMBINING GONG */ - { 0x01B80, 0x01B82 }, /* SUNDANESE SIGN PANYECEK - SUNDANESE SIGN PANGWISAD */ - { 0x01BA1, 0x01BAD }, /* SUNDANESE CONSONANT SIGN PAMINGKAL - SUNDANESE CONSONANT SIGN PASANGAN WA */ - { 0x01BE6, 0x01BF3 }, /* BATAK SIGN TOMPI - BATAK PANONGONAN */ - { 0x01C24, 0x01C37 }, /* LEPCHA SUBJOINED LETTER YA - LEPCHA SIGN NUKTA */ - { 0x01CD0, 0x01CD2 }, /* VEDIC TONE KARSHANA - VEDIC TONE PRENKHA */ - { 0x01CD4, 0x01CE8 }, /* VEDIC SIGN YAJURVEDIC MIDLINE SVARITA - VEDIC SIGN VISARGA ANUDATTA WITH TAIL */ - { 0x01CED, 0x01CED }, /* VEDIC SIGN TIRYAK */ - { 0x01CF4, 0x01CF4 }, /* VEDIC TONE CANDRA ABOVE */ - { 0x01CF7, 0x01CF9 }, /* VEDIC SIGN ATIKRAMA - VEDIC TONE DOUBLE RING ABOVE */ - { 0x01DC0, 0x01DFF }, /* COMBINING DOTTED GRAVE ACCENT - COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW */ - { 0x0200B, 0x0200F }, /* ZERO WIDTH SPACE - RIGHT-TO-LEFT MARK */ - { 0x0202A, 0x0202E }, /* LEFT-TO-RIGHT EMBEDDING - RIGHT-TO-LEFT OVERRIDE */ - { 0x02060, 0x02064 }, /* WORD JOINER - INVISIBLE PLUS */ - { 0x02066, 0x0206F }, /* LEFT-TO-RIGHT ISOLATE - NOMINAL DIGIT SHAPES */ - { 0x020D0, 0x020F0 }, /* COMBINING LEFT HARPOON ABOVE - COMBINING ASTERISK ABOVE */ - { 0x02640, 0x02640 }, /* FEMALE SIGN */ - { 0x02642, 0x02642 }, /* MALE SIGN */ - { 0x026A7, 0x026A7 }, /* MALE WITH STROKE AND MALE AND FEMALE SIGN */ - { 0x02CEF, 0x02CF1 }, /* COPTIC COMBINING NI ABOVE - COPTIC COMBINING SPIRITUS LENIS */ - { 0x02D7F, 0x02D7F }, /* TIFINAGH CONSONANT JOINER */ - { 0x02DE0, 0x02DFF }, /* COMBINING CYRILLIC LETTER BE - COMBINING CYRILLIC LETTER IOTIFIED BIG YUS */ - { 0x0302A, 0x0302F }, /* IDEOGRAPHIC LEVEL TONE MARK - HANGUL DOUBLE DOT TONE MARK */ - { 0x03099, 0x0309A }, /* COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK - COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK */ - { 0x0A66F, 0x0A672 }, /* COMBINING CYRILLIC VZMET - COMBINING CYRILLIC THOUSAND MILLIONS SIGN */ - { 0x0A674, 0x0A67D }, /* COMBINING CYRILLIC LETTER UKRAINIAN IE - COMBINING CYRILLIC PAYEROK */ - { 0x0A69E, 0x0A69F }, /* COMBINING CYRILLIC LETTER EF - COMBINING CYRILLIC LETTER IOTIFIED E */ - { 0x0A6F0, 0x0A6F1 }, /* BAMUM COMBINING MARK KOQNDON - BAMUM COMBINING MARK TUKWENTIS */ - { 0x0A802, 0x0A802 }, /* SYLOTI NAGRI SIGN DVISVARA */ - { 0x0A806, 0x0A806 }, /* SYLOTI NAGRI SIGN HASANTA */ - { 0x0A80B, 0x0A80B }, /* SYLOTI NAGRI SIGN ANUSVARA */ - { 0x0A823, 0x0A827 }, /* SYLOTI NAGRI VOWEL SIGN A - SYLOTI NAGRI VOWEL SIGN OO */ - { 0x0A82C, 0x0A82C }, /* SYLOTI NAGRI SIGN ALTERNATE HASANTA */ - { 0x0A880, 0x0A881 }, /* SAURASHTRA SIGN ANUSVARA - SAURASHTRA SIGN VISARGA */ - { 0x0A8B4, 0x0A8C5 }, /* SAURASHTRA CONSONANT SIGN HAARU - SAURASHTRA SIGN CANDRABINDU */ - { 0x0A8E0, 0x0A8F1 }, /* COMBINING DEVANAGARI DIGIT ZERO - COMBINING DEVANAGARI SIGN AVAGRAHA */ - { 0x0A8FF, 0x0A8FF }, /* DEVANAGARI VOWEL SIGN AY */ - { 0x0A926, 0x0A92D }, /* KAYAH LI VOWEL UE - KAYAH LI TONE CALYA PLOPHU */ - { 0x0A947, 0x0A953 }, /* REJANG VOWEL SIGN I - REJANG VIRAMA */ - { 0x0A980, 0x0A983 }, /* JAVANESE SIGN PANYANGGA - JAVANESE SIGN WIGNYAN */ - { 0x0A9B3, 0x0A9C0 }, /* JAVANESE SIGN CECAK TELU - JAVANESE PANGKON */ - { 0x0A9E5, 0x0A9E5 }, /* MYANMAR SIGN SHAN SAW */ - { 0x0AA29, 0x0AA36 }, /* CHAM VOWEL SIGN AA - CHAM CONSONANT SIGN WA */ - { 0x0AA43, 0x0AA43 }, /* CHAM CONSONANT SIGN FINAL NG */ - { 0x0AA4C, 0x0AA4D }, /* CHAM CONSONANT SIGN FINAL M - CHAM CONSONANT SIGN FINAL H */ - { 0x0AA7B, 0x0AA7D }, /* MYANMAR SIGN PAO KAREN TONE - MYANMAR SIGN TAI LAING TONE-5 */ - { 0x0AAB0, 0x0AAB0 }, /* TAI VIET MAI KANG */ - { 0x0AAB2, 0x0AAB4 }, /* TAI VIET VOWEL I - TAI VIET VOWEL U */ - { 0x0AAB7, 0x0AAB8 }, /* TAI VIET MAI KHIT - TAI VIET VOWEL IA */ - { 0x0AABE, 0x0AABF }, /* TAI VIET VOWEL AM - TAI VIET TONE MAI EK */ - { 0x0AAC1, 0x0AAC1 }, /* TAI VIET TONE MAI THO */ - { 0x0AAEB, 0x0AAEF }, /* MEETEI MAYEK VOWEL SIGN II - MEETEI MAYEK VOWEL SIGN AAU */ - { 0x0AAF5, 0x0AAF6 }, /* MEETEI MAYEK VOWEL SIGN VISARGA - MEETEI MAYEK VIRAMA */ - { 0x0ABE3, 0x0ABEA }, /* MEETEI MAYEK VOWEL SIGN ONAP - MEETEI MAYEK VOWEL SIGN NUNG */ - { 0x0ABEC, 0x0ABED }, /* MEETEI MAYEK LUM IYEK - MEETEI MAYEK APUN IYEK */ - { 0x0FB1E, 0x0FB1E }, /* HEBREW POINT JUDEO-SPANISH VARIKA */ - { 0x0FE00, 0x0FE0F }, /* VARIATION SELECTOR-1 - VARIATION SELECTOR-16 */ - { 0x0FE20, 0x0FE2F }, /* COMBINING LIGATURE LEFT HALF - COMBINING CYRILLIC TITLO RIGHT HALF */ - { 0x0FEFF, 0x0FEFF }, /* ZERO WIDTH NO-BREAK SPACE */ - { 0x0FFF9, 0x0FFFB }, /* INTERLINEAR ANNOTATION ANCHOR - INTERLINEAR ANNOTATION TERMINATOR */ +/* Zero-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct ucs_interval16 ucs_zero_width_bmp_ranges[] = { + { 0x00AD, 0x00AD }, /* SOFT HYPHEN */ + { 0x0300, 0x036F }, /* COMBINING GRAVE ACCENT - COMBINING LATIN SMALL LETTER X */ + { 0x0483, 0x0489 }, /* COMBINING CYRILLIC TITLO - COMBINING CYRILLIC MILLIONS SIGN */ + { 0x0591, 0x05BD }, /* HEBREW ACCENT ETNAHTA - HEBREW POINT METEG */ + { 0x05BF, 0x05BF }, /* HEBREW POINT RAFE */ + { 0x05C1, 0x05C2 }, /* HEBREW POINT SHIN DOT - HEBREW POINT SIN DOT */ + { 0x05C4, 0x05C5 }, /* HEBREW MARK UPPER DOT - HEBREW MARK LOWER DOT */ + { 0x05C7, 0x05C7 }, /* HEBREW POINT QAMATS QATAN */ + { 0x0600, 0x0605 }, /* ARABIC NUMBER SIGN - ARABIC NUMBER MARK ABOVE */ + { 0x0610, 0x061A }, /* ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM - ARABIC SMALL KASRA */ + { 0x061C, 0x061C }, /* ARABIC LETTER MARK */ + { 0x064B, 0x065F }, /* ARABIC FATHATAN - ARABIC WAVY HAMZA BELOW */ + { 0x0670, 0x0670 }, /* ARABIC LETTER SUPERSCRIPT ALEF */ + { 0x06D6, 0x06DD }, /* ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA - ARABIC END OF AYAH */ + { 0x06DF, 0x06E4 }, /* ARABIC SMALL HIGH ROUNDED ZERO - ARABIC SMALL HIGH MADDA */ + { 0x06E7, 0x06E8 }, /* ARABIC SMALL HIGH YEH - ARABIC SMALL HIGH NOON */ + { 0x06EA, 0x06ED }, /* ARABIC EMPTY CENTRE LOW STOP - ARABIC SMALL LOW MEEM */ + { 0x070F, 0x070F }, /* SYRIAC ABBREVIATION MARK */ + { 0x0711, 0x0711 }, /* SYRIAC LETTER SUPERSCRIPT ALAPH */ + { 0x0730, 0x074A }, /* SYRIAC PTHAHA ABOVE - SYRIAC BARREKH */ + { 0x07A6, 0x07B0 }, /* THAANA ABAFILI - THAANA SUKUN */ + { 0x07EB, 0x07F3 }, /* NKO COMBINING SHORT HIGH TONE - NKO COMBINING DOUBLE DOT ABOVE */ + { 0x07FD, 0x07FD }, /* NKO DANTAYALAN */ + { 0x0816, 0x0819 }, /* SAMARITAN MARK IN - SAMARITAN MARK DAGESH */ + { 0x081B, 0x0823 }, /* SAMARITAN MARK EPENTHETIC YUT - SAMARITAN VOWEL SIGN A */ + { 0x0825, 0x0827 }, /* SAMARITAN VOWEL SIGN SHORT A - SAMARITAN VOWEL SIGN U */ + { 0x0829, 0x082D }, /* SAMARITAN VOWEL SIGN LONG I - SAMARITAN MARK NEQUDAA */ + { 0x0859, 0x085B }, /* MANDAIC AFFRICATION MARK - MANDAIC GEMINATION MARK */ + { 0x0890, 0x0891 }, /* ARABIC POUND MARK ABOVE - ARABIC PIASTRE MARK ABOVE */ + { 0x0897, 0x089F }, /* ARABIC PEPET - ARABIC HALF MADDA OVER MADDA */ + { 0x08CA, 0x0903 }, /* ARABIC SMALL HIGH FARSI YEH - DEVANAGARI SIGN VISARGA */ + { 0x093A, 0x093C }, /* DEVANAGARI VOWEL SIGN OE - DEVANAGARI SIGN NUKTA */ + { 0x093E, 0x094F }, /* DEVANAGARI VOWEL SIGN AA - DEVANAGARI VOWEL SIGN AW */ + { 0x0951, 0x0957 }, /* DEVANAGARI STRESS SIGN UDATTA - DEVANAGARI VOWEL SIGN UUE */ + { 0x0962, 0x0963 }, /* DEVANAGARI VOWEL SIGN VOCALIC L - DEVANAGARI VOWEL SIGN VOCALIC LL */ + { 0x0981, 0x0983 }, /* BENGALI SIGN CANDRABINDU - BENGALI SIGN VISARGA */ + { 0x09BC, 0x09BC }, /* BENGALI SIGN NUKTA */ + { 0x09BE, 0x09C4 }, /* BENGALI VOWEL SIGN AA - BENGALI VOWEL SIGN VOCALIC RR */ + { 0x09C7, 0x09C8 }, /* BENGALI VOWEL SIGN E - BENGALI VOWEL SIGN AI */ + { 0x09CB, 0x09CD }, /* BENGALI VOWEL SIGN O - BENGALI SIGN VIRAMA */ + { 0x09D7, 0x09D7 }, /* BENGALI AU LENGTH MARK */ + { 0x09E2, 0x09E3 }, /* BENGALI VOWEL SIGN VOCALIC L - BENGALI VOWEL SIGN VOCALIC LL */ + { 0x09FE, 0x09FE }, /* BENGALI SANDHI MARK */ + { 0x0A01, 0x0A03 }, /* GURMUKHI SIGN ADAK BINDI - GURMUKHI SIGN VISARGA */ + { 0x0A3C, 0x0A3C }, /* GURMUKHI SIGN NUKTA */ + { 0x0A3E, 0x0A42 }, /* GURMUKHI VOWEL SIGN AA - GURMUKHI VOWEL SIGN UU */ + { 0x0A47, 0x0A48 }, /* GURMUKHI VOWEL SIGN EE - GURMUKHI VOWEL SIGN AI */ + { 0x0A4B, 0x0A4D }, /* GURMUKHI VOWEL SIGN OO - GURMUKHI SIGN VIRAMA */ + { 0x0A51, 0x0A51 }, /* GURMUKHI SIGN UDAAT */ + { 0x0A70, 0x0A71 }, /* GURMUKHI TIPPI - GURMUKHI ADDAK */ + { 0x0A75, 0x0A75 }, /* GURMUKHI SIGN YAKASH */ + { 0x0A81, 0x0A83 }, /* GUJARATI SIGN CANDRABINDU - GUJARATI SIGN VISARGA */ + { 0x0ABC, 0x0ABC }, /* GUJARATI SIGN NUKTA */ + { 0x0ABE, 0x0AC5 }, /* GUJARATI VOWEL SIGN AA - GUJARATI VOWEL SIGN CANDRA E */ + { 0x0AC7, 0x0AC9 }, /* GUJARATI VOWEL SIGN E - GUJARATI VOWEL SIGN CANDRA O */ + { 0x0ACB, 0x0ACD }, /* GUJARATI VOWEL SIGN O - GUJARATI SIGN VIRAMA */ + { 0x0AE2, 0x0AE3 }, /* GUJARATI VOWEL SIGN VOCALIC L - GUJARATI VOWEL SIGN VOCALIC LL */ + { 0x0AFA, 0x0AFF }, /* GUJARATI SIGN SUKUN - GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE */ + { 0x0B01, 0x0B03 }, /* ORIYA SIGN CANDRABINDU - ORIYA SIGN VISARGA */ + { 0x0B3C, 0x0B3C }, /* ORIYA SIGN NUKTA */ + { 0x0B3E, 0x0B44 }, /* ORIYA VOWEL SIGN AA - ORIYA VOWEL SIGN VOCALIC RR */ + { 0x0B47, 0x0B48 }, /* ORIYA VOWEL SIGN E - ORIYA VOWEL SIGN AI */ + { 0x0B4B, 0x0B4D }, /* ORIYA VOWEL SIGN O - ORIYA SIGN VIRAMA */ + { 0x0B55, 0x0B57 }, /* ORIYA SIGN OVERLINE - ORIYA AU LENGTH MARK */ + { 0x0B62, 0x0B63 }, /* ORIYA VOWEL SIGN VOCALIC L - ORIYA VOWEL SIGN VOCALIC LL */ + { 0x0B82, 0x0B82 }, /* TAMIL SIGN ANUSVARA */ + { 0x0BBE, 0x0BC2 }, /* TAMIL VOWEL SIGN AA - TAMIL VOWEL SIGN UU */ + { 0x0BC6, 0x0BC8 }, /* TAMIL VOWEL SIGN E - TAMIL VOWEL SIGN AI */ + { 0x0BCA, 0x0BCD }, /* TAMIL VOWEL SIGN O - TAMIL SIGN VIRAMA */ + { 0x0BD7, 0x0BD7 }, /* TAMIL AU LENGTH MARK */ + { 0x0C00, 0x0C04 }, /* TELUGU SIGN COMBINING CANDRABINDU ABOVE - TELUGU SIGN COMBINING ANUSVARA ABOVE */ + { 0x0C3C, 0x0C3C }, /* TELUGU SIGN NUKTA */ + { 0x0C3E, 0x0C44 }, /* TELUGU VOWEL SIGN AA - TELUGU VOWEL SIGN VOCALIC RR */ + { 0x0C46, 0x0C48 }, /* TELUGU VOWEL SIGN E - TELUGU VOWEL SIGN AI */ + { 0x0C4A, 0x0C4D }, /* TELUGU VOWEL SIGN O - TELUGU SIGN VIRAMA */ + { 0x0C55, 0x0C56 }, /* TELUGU LENGTH MARK - TELUGU AI LENGTH MARK */ + { 0x0C62, 0x0C63 }, /* TELUGU VOWEL SIGN VOCALIC L - TELUGU VOWEL SIGN VOCALIC LL */ + { 0x0C81, 0x0C83 }, /* KANNADA SIGN CANDRABINDU - KANNADA SIGN VISARGA */ + { 0x0CBC, 0x0CBC }, /* KANNADA SIGN NUKTA */ + { 0x0CBE, 0x0CC4 }, /* KANNADA VOWEL SIGN AA - KANNADA VOWEL SIGN VOCALIC RR */ + { 0x0CC6, 0x0CC8 }, /* KANNADA VOWEL SIGN E - KANNADA VOWEL SIGN AI */ + { 0x0CCA, 0x0CCD }, /* KANNADA VOWEL SIGN O - KANNADA SIGN VIRAMA */ + { 0x0CD5, 0x0CD6 }, /* KANNADA LENGTH MARK - KANNADA AI LENGTH MARK */ + { 0x0CE2, 0x0CE3 }, /* KANNADA VOWEL SIGN VOCALIC L - KANNADA VOWEL SIGN VOCALIC LL */ + { 0x0CF3, 0x0CF3 }, /* KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT */ + { 0x0D00, 0x0D03 }, /* MALAYALAM SIGN COMBINING ANUSVARA ABOVE - MALAYALAM SIGN VISARGA */ + { 0x0D3B, 0x0D3C }, /* MALAYALAM SIGN VERTICAL BAR VIRAMA - MALAYALAM SIGN CIRCULAR VIRAMA */ + { 0x0D3E, 0x0D44 }, /* MALAYALAM VOWEL SIGN AA - MALAYALAM VOWEL SIGN VOCALIC RR */ + { 0x0D46, 0x0D48 }, /* MALAYALAM VOWEL SIGN E - MALAYALAM VOWEL SIGN AI */ + { 0x0D4A, 0x0D4D }, /* MALAYALAM VOWEL SIGN O - MALAYALAM SIGN VIRAMA */ + { 0x0D57, 0x0D57 }, /* MALAYALAM AU LENGTH MARK */ + { 0x0D62, 0x0D63 }, /* MALAYALAM VOWEL SIGN VOCALIC L - MALAYALAM VOWEL SIGN VOCALIC LL */ + { 0x0D81, 0x0D83 }, /* SINHALA SIGN CANDRABINDU - SINHALA SIGN VISARGAYA */ + { 0x0DCA, 0x0DCA }, /* SINHALA SIGN AL-LAKUNA */ + { 0x0DCF, 0x0DD4 }, /* SINHALA VOWEL SIGN AELA-PILLA - SINHALA VOWEL SIGN KETTI PAA-PILLA */ + { 0x0DD6, 0x0DD6 }, /* SINHALA VOWEL SIGN DIGA PAA-PILLA */ + { 0x0DD8, 0x0DDF }, /* SINHALA VOWEL SIGN GAETTA-PILLA - SINHALA VOWEL SIGN GAYANUKITTA */ + { 0x0DF2, 0x0DF3 }, /* SINHALA VOWEL SIGN DIGA GAETTA-PILLA - SINHALA VOWEL SIGN DIGA GAYANUKITTA */ + { 0x0E31, 0x0E31 }, /* THAI CHARACTER MAI HAN-AKAT */ + { 0x0E34, 0x0E3A }, /* THAI CHARACTER SARA I - THAI CHARACTER PHINTHU */ + { 0x0E47, 0x0E4E }, /* THAI CHARACTER MAITAIKHU - THAI CHARACTER YAMAKKAN */ + { 0x0EB1, 0x0EB1 }, /* LAO VOWEL SIGN MAI KAN */ + { 0x0EB4, 0x0EBC }, /* LAO VOWEL SIGN I - LAO SEMIVOWEL SIGN LO */ + { 0x0EC8, 0x0ECE }, /* LAO TONE MAI EK - LAO YAMAKKAN */ + { 0x0F18, 0x0F19 }, /* TIBETAN ASTROLOGICAL SIGN -KHYUD PA - TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS */ + { 0x0F35, 0x0F35 }, /* TIBETAN MARK NGAS BZUNG NYI ZLA */ + { 0x0F37, 0x0F37 }, /* TIBETAN MARK NGAS BZUNG SGOR RTAGS */ + { 0x0F39, 0x0F39 }, /* TIBETAN MARK TSA -PHRU */ + { 0x0F3E, 0x0F3F }, /* TIBETAN SIGN YAR TSHES - TIBETAN SIGN MAR TSHES */ + { 0x0F71, 0x0F84 }, /* TIBETAN VOWEL SIGN AA - TIBETAN MARK HALANTA */ + { 0x0F86, 0x0F87 }, /* TIBETAN SIGN LCI RTAGS - TIBETAN SIGN YANG RTAGS */ + { 0x0F8D, 0x0F97 }, /* TIBETAN SUBJOINED SIGN LCE TSA CAN - TIBETAN SUBJOINED LETTER JA */ + { 0x0F99, 0x0FBC }, /* TIBETAN SUBJOINED LETTER NYA - TIBETAN SUBJOINED LETTER FIXED-FORM RA */ + { 0x0FC6, 0x0FC6 }, /* TIBETAN SYMBOL PADMA GDAN */ + { 0x102B, 0x103E }, /* MYANMAR VOWEL SIGN TALL AA - MYANMAR CONSONANT SIGN MEDIAL HA */ + { 0x1056, 0x1059 }, /* MYANMAR VOWEL SIGN VOCALIC R - MYANMAR VOWEL SIGN VOCALIC LL */ + { 0x105E, 0x1060 }, /* MYANMAR CONSONANT SIGN MON MEDIAL NA - MYANMAR CONSONANT SIGN MON MEDIAL LA */ + { 0x1062, 0x1064 }, /* MYANMAR VOWEL SIGN SGAW KAREN EU - MYANMAR TONE MARK SGAW KAREN KE PHO */ + { 0x1067, 0x106D }, /* MYANMAR VOWEL SIGN WESTERN PWO KAREN EU - MYANMAR SIGN WESTERN PWO KAREN TONE-5 */ + { 0x1071, 0x1074 }, /* MYANMAR VOWEL SIGN GEBA KAREN I - MYANMAR VOWEL SIGN KAYAH EE */ + { 0x1082, 0x108D }, /* MYANMAR CONSONANT SIGN SHAN MEDIAL WA - MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE */ + { 0x108F, 0x108F }, /* MYANMAR SIGN RUMAI PALAUNG TONE-5 */ + { 0x109A, 0x109D }, /* MYANMAR SIGN KHAMTI TONE-1 - MYANMAR VOWEL SIGN AITON AI */ + { 0x135D, 0x135F }, /* ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK - ETHIOPIC COMBINING GEMINATION MARK */ + { 0x1712, 0x1715 }, /* TAGALOG VOWEL SIGN I - TAGALOG SIGN PAMUDPOD */ + { 0x1732, 0x1734 }, /* HANUNOO VOWEL SIGN I - HANUNOO SIGN PAMUDPOD */ + { 0x1752, 0x1753 }, /* BUHID VOWEL SIGN I - BUHID VOWEL SIGN U */ + { 0x1772, 0x1773 }, /* TAGBANWA VOWEL SIGN I - TAGBANWA VOWEL SIGN U */ + { 0x17B4, 0x17D3 }, /* KHMER VOWEL INHERENT AQ - KHMER SIGN BATHAMASAT */ + { 0x17DD, 0x17DD }, /* KHMER SIGN ATTHACAN */ + { 0x180B, 0x180F }, /* MONGOLIAN FREE VARIATION SELECTOR ONE - MONGOLIAN FREE VARIATION SELECTOR FOUR */ + { 0x1885, 0x1886 }, /* MONGOLIAN LETTER ALI GALI BALUDA - MONGOLIAN LETTER ALI GALI THREE BALUDA */ + { 0x18A9, 0x18A9 }, /* MONGOLIAN LETTER ALI GALI DAGALGA */ + { 0x1920, 0x192B }, /* LIMBU VOWEL SIGN A - LIMBU SUBJOINED LETTER WA */ + { 0x1930, 0x193B }, /* LIMBU SMALL LETTER KA - LIMBU SIGN SA-I */ + { 0x1A17, 0x1A1B }, /* BUGINESE VOWEL SIGN I - BUGINESE VOWEL SIGN AE */ + { 0x1A55, 0x1A5E }, /* TAI THAM CONSONANT SIGN MEDIAL RA - TAI THAM CONSONANT SIGN SA */ + { 0x1A60, 0x1A7C }, /* TAI THAM SIGN SAKOT - TAI THAM SIGN KHUEN-LUE KARAN */ + { 0x1A7F, 0x1A7F }, /* TAI THAM COMBINING CRYPTOGRAMMIC DOT */ + { 0x1AB0, 0x1ACE }, /* COMBINING DOUBLED CIRCUMFLEX ACCENT - COMBINING LATIN SMALL LETTER INSULAR T */ + { 0x1B00, 0x1B04 }, /* BALINESE SIGN ULU RICEM - BALINESE SIGN BISAH */ + { 0x1B34, 0x1B44 }, /* BALINESE SIGN REREKAN - BALINESE ADEG ADEG */ + { 0x1B6B, 0x1B73 }, /* BALINESE MUSICAL SYMBOL COMBINING TEGEH - BALINESE MUSICAL SYMBOL COMBINING GONG */ + { 0x1B80, 0x1B82 }, /* SUNDANESE SIGN PANYECEK - SUNDANESE SIGN PANGWISAD */ + { 0x1BA1, 0x1BAD }, /* SUNDANESE CONSONANT SIGN PAMINGKAL - SUNDANESE CONSONANT SIGN PASANGAN WA */ + { 0x1BE6, 0x1BF3 }, /* BATAK SIGN TOMPI - BATAK PANONGONAN */ + { 0x1C24, 0x1C37 }, /* LEPCHA SUBJOINED LETTER YA - LEPCHA SIGN NUKTA */ + { 0x1CD0, 0x1CD2 }, /* VEDIC TONE KARSHANA - VEDIC TONE PRENKHA */ + { 0x1CD4, 0x1CE8 }, /* VEDIC SIGN YAJURVEDIC MIDLINE SVARITA - VEDIC SIGN VISARGA ANUDATTA WITH TAIL */ + { 0x1CED, 0x1CED }, /* VEDIC SIGN TIRYAK */ + { 0x1CF4, 0x1CF4 }, /* VEDIC TONE CANDRA ABOVE */ + { 0x1CF7, 0x1CF9 }, /* VEDIC SIGN ATIKRAMA - VEDIC TONE DOUBLE RING ABOVE */ + { 0x1DC0, 0x1DFF }, /* COMBINING DOTTED GRAVE ACCENT - COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW */ + { 0x200B, 0x200F }, /* ZERO WIDTH SPACE - RIGHT-TO-LEFT MARK */ + { 0x202A, 0x202E }, /* LEFT-TO-RIGHT EMBEDDING - RIGHT-TO-LEFT OVERRIDE */ + { 0x2060, 0x2064 }, /* WORD JOINER - INVISIBLE PLUS */ + { 0x2066, 0x206F }, /* LEFT-TO-RIGHT ISOLATE - NOMINAL DIGIT SHAPES */ + { 0x20D0, 0x20F0 }, /* COMBINING LEFT HARPOON ABOVE - COMBINING ASTERISK ABOVE */ + { 0x2640, 0x2640 }, /* FEMALE SIGN */ + { 0x2642, 0x2642 }, /* MALE SIGN */ + { 0x26A7, 0x26A7 }, /* MALE WITH STROKE AND MALE AND FEMALE SIGN */ + { 0x2CEF, 0x2CF1 }, /* COPTIC COMBINING NI ABOVE - COPTIC COMBINING SPIRITUS LENIS */ + { 0x2D7F, 0x2D7F }, /* TIFINAGH CONSONANT JOINER */ + { 0x2DE0, 0x2DFF }, /* COMBINING CYRILLIC LETTER BE - COMBINING CYRILLIC LETTER IOTIFIED BIG YUS */ + { 0x302A, 0x302F }, /* IDEOGRAPHIC LEVEL TONE MARK - HANGUL DOUBLE DOT TONE MARK */ + { 0x3099, 0x309A }, /* COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK - COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK */ + { 0xA66F, 0xA672 }, /* COMBINING CYRILLIC VZMET - COMBINING CYRILLIC THOUSAND MILLIONS SIGN */ + { 0xA674, 0xA67D }, /* COMBINING CYRILLIC LETTER UKRAINIAN IE - COMBINING CYRILLIC PAYEROK */ + { 0xA69E, 0xA69F }, /* COMBINING CYRILLIC LETTER EF - COMBINING CYRILLIC LETTER IOTIFIED E */ + { 0xA6F0, 0xA6F1 }, /* BAMUM COMBINING MARK KOQNDON - BAMUM COMBINING MARK TUKWENTIS */ + { 0xA802, 0xA802 }, /* SYLOTI NAGRI SIGN DVISVARA */ + { 0xA806, 0xA806 }, /* SYLOTI NAGRI SIGN HASANTA */ + { 0xA80B, 0xA80B }, /* SYLOTI NAGRI SIGN ANUSVARA */ + { 0xA823, 0xA827 }, /* SYLOTI NAGRI VOWEL SIGN A - SYLOTI NAGRI VOWEL SIGN OO */ + { 0xA82C, 0xA82C }, /* SYLOTI NAGRI SIGN ALTERNATE HASANTA */ + { 0xA880, 0xA881 }, /* SAURASHTRA SIGN ANUSVARA - SAURASHTRA SIGN VISARGA */ + { 0xA8B4, 0xA8C5 }, /* SAURASHTRA CONSONANT SIGN HAARU - SAURASHTRA SIGN CANDRABINDU */ + { 0xA8E0, 0xA8F1 }, /* COMBINING DEVANAGARI DIGIT ZERO - COMBINING DEVANAGARI SIGN AVAGRAHA */ + { 0xA8FF, 0xA8FF }, /* DEVANAGARI VOWEL SIGN AY */ + { 0xA926, 0xA92D }, /* KAYAH LI VOWEL UE - KAYAH LI TONE CALYA PLOPHU */ + { 0xA947, 0xA953 }, /* REJANG VOWEL SIGN I - REJANG VIRAMA */ + { 0xA980, 0xA983 }, /* JAVANESE SIGN PANYANGGA - JAVANESE SIGN WIGNYAN */ + { 0xA9B3, 0xA9C0 }, /* JAVANESE SIGN CECAK TELU - JAVANESE PANGKON */ + { 0xA9E5, 0xA9E5 }, /* MYANMAR SIGN SHAN SAW */ + { 0xAA29, 0xAA36 }, /* CHAM VOWEL SIGN AA - CHAM CONSONANT SIGN WA */ + { 0xAA43, 0xAA43 }, /* CHAM CONSONANT SIGN FINAL NG */ + { 0xAA4C, 0xAA4D }, /* CHAM CONSONANT SIGN FINAL M - CHAM CONSONANT SIGN FINAL H */ + { 0xAA7B, 0xAA7D }, /* MYANMAR SIGN PAO KAREN TONE - MYANMAR SIGN TAI LAING TONE-5 */ + { 0xAAB0, 0xAAB0 }, /* TAI VIET MAI KANG */ + { 0xAAB2, 0xAAB4 }, /* TAI VIET VOWEL I - TAI VIET VOWEL U */ + { 0xAAB7, 0xAAB8 }, /* TAI VIET MAI KHIT - TAI VIET VOWEL IA */ + { 0xAABE, 0xAABF }, /* TAI VIET VOWEL AM - TAI VIET TONE MAI EK */ + { 0xAAC1, 0xAAC1 }, /* TAI VIET TONE MAI THO */ + { 0xAAEB, 0xAAEF }, /* MEETEI MAYEK VOWEL SIGN II - MEETEI MAYEK VOWEL SIGN AAU */ + { 0xAAF5, 0xAAF6 }, /* MEETEI MAYEK VOWEL SIGN VISARGA - MEETEI MAYEK VIRAMA */ + { 0xABE3, 0xABEA }, /* MEETEI MAYEK VOWEL SIGN ONAP - MEETEI MAYEK VOWEL SIGN NUNG */ + { 0xABEC, 0xABED }, /* MEETEI MAYEK LUM IYEK - MEETEI MAYEK APUN IYEK */ + { 0xFB1E, 0xFB1E }, /* HEBREW POINT JUDEO-SPANISH VARIKA */ + { 0xFE00, 0xFE0F }, /* VARIATION SELECTOR-1 - VARIATION SELECTOR-16 */ + { 0xFE20, 0xFE2F }, /* COMBINING LIGATURE LEFT HALF - COMBINING CYRILLIC TITLO RIGHT HALF */ + { 0xFEFF, 0xFEFF }, /* ZERO WIDTH NO-BREAK SPACE */ + { 0xFFF9, 0xFFFB }, /* INTERLINEAR ANNOTATION ANCHOR - INTERLINEAR ANNOTATION TERMINATOR */ +}; + +/* Zero-width character ranges (non-BMP, U+10000 and above) */ +static const struct ucs_interval32 ucs_zero_width_non_bmp_ranges[] = { { 0x101FD, 0x101FD }, /* PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE */ { 0x102E0, 0x102E0 }, /* COPTIC EPACT THOUSANDS MARK */ { 0x10376, 0x1037A }, /* COMBINING OLD PERMIC LETTER AN - COMBINING OLD PERMIC LETTER SII */ @@ -350,68 +354,72 @@ static const struct ucs_interval ucs_zero_width_ranges[] = { { 0xE0100, 0xE01EF }, /* VARIATION SELECTOR-17 - VARIATION SELECTOR-256 */ }; -/* Double-width character ranges */ -static const struct ucs_interval ucs_double_width_ranges[] = { - { 0x01100, 0x0115F }, /* HANGUL CHOSEONG KIYEOK - HANGUL CHOSEONG FILLER */ - { 0x0231A, 0x0231B }, /* WATCH - HOURGLASS */ - { 0x02329, 0x0232A }, /* LEFT-POINTING ANGLE BRACKET - RIGHT-POINTING ANGLE BRACKET */ - { 0x023E9, 0x023EC }, /* BLACK RIGHT-POINTING DOUBLE TRIANGLE - BLACK DOWN-POINTING DOUBLE TRIANGLE */ - { 0x023F0, 0x023F0 }, /* ALARM CLOCK */ - { 0x023F3, 0x023F3 }, /* HOURGLASS WITH FLOWING SAND */ - { 0x025FD, 0x025FE }, /* WHITE MEDIUM SMALL SQUARE - BLACK MEDIUM SMALL SQUARE */ - { 0x02614, 0x02615 }, /* UMBRELLA WITH RAIN DROPS - HOT BEVERAGE */ - { 0x02630, 0x02637 }, /* TRIGRAM FOR HEAVEN - TRIGRAM FOR EARTH */ - { 0x02648, 0x02653 }, /* ARIES - PISCES */ - { 0x0267F, 0x0267F }, /* WHEELCHAIR SYMBOL */ - { 0x0268A, 0x0268F }, /* MONOGRAM FOR YANG - DIGRAM FOR GREATER YIN */ - { 0x02693, 0x02693 }, /* ANCHOR */ - { 0x026A1, 0x026A1 }, /* HIGH VOLTAGE SIGN */ - { 0x026AA, 0x026AB }, /* MEDIUM WHITE CIRCLE - MEDIUM BLACK CIRCLE */ - { 0x026BD, 0x026BE }, /* SOCCER BALL - BASEBALL */ - { 0x026C4, 0x026C5 }, /* SNOWMAN WITHOUT SNOW - SUN BEHIND CLOUD */ - { 0x026CE, 0x026CE }, /* OPHIUCHUS */ - { 0x026D4, 0x026D4 }, /* NO ENTRY */ - { 0x026EA, 0x026EA }, /* CHURCH */ - { 0x026F2, 0x026F3 }, /* FOUNTAIN - FLAG IN HOLE */ - { 0x026F5, 0x026F5 }, /* SAILBOAT */ - { 0x026FA, 0x026FA }, /* TENT */ - { 0x026FD, 0x026FD }, /* FUEL PUMP */ - { 0x02705, 0x02705 }, /* WHITE HEAVY CHECK MARK */ - { 0x0270A, 0x0270B }, /* RAISED FIST - RAISED HAND */ - { 0x02728, 0x02728 }, /* SPARKLES */ - { 0x0274C, 0x0274C }, /* CROSS MARK */ - { 0x0274E, 0x0274E }, /* NEGATIVE SQUARED CROSS MARK */ - { 0x02753, 0x02755 }, /* BLACK QUESTION MARK ORNAMENT - WHITE EXCLAMATION MARK ORNAMENT */ - { 0x02757, 0x02757 }, /* HEAVY EXCLAMATION MARK SYMBOL */ - { 0x02795, 0x02797 }, /* HEAVY PLUS SIGN - HEAVY DIVISION SIGN */ - { 0x027B0, 0x027B0 }, /* CURLY LOOP */ - { 0x027BF, 0x027BF }, /* DOUBLE CURLY LOOP */ - { 0x02B1B, 0x02B1C }, /* BLACK LARGE SQUARE - WHITE LARGE SQUARE */ - { 0x02B50, 0x02B50 }, /* WHITE MEDIUM STAR */ - { 0x02B55, 0x02B55 }, /* HEAVY LARGE CIRCLE */ - { 0x02E80, 0x02E99 }, /* CJK RADICAL REPEAT - CJK RADICAL RAP */ - { 0x02E9B, 0x02EF3 }, /* CJK RADICAL CHOKE - CJK RADICAL C-SIMPLIFIED TURTLE */ - { 0x02F00, 0x02FD5 }, /* KANGXI RADICAL ONE - KANGXI RADICAL FLUTE */ - { 0x02FF0, 0x03029 }, /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT - HANGZHOU NUMERAL NINE */ - { 0x03030, 0x0303E }, /* WAVY DASH - IDEOGRAPHIC VARIATION INDICATOR */ - { 0x03041, 0x03096 }, /* HIRAGANA LETTER SMALL A - HIRAGANA LETTER SMALL KE */ - { 0x0309B, 0x030FF }, /* KATAKANA-HIRAGANA VOICED SOUND MARK - KATAKANA DIGRAPH KOTO */ - { 0x03105, 0x0312F }, /* BOPOMOFO LETTER B - BOPOMOFO LETTER NN */ - { 0x03131, 0x0318E }, /* HANGUL LETTER KIYEOK - HANGUL LETTER ARAEAE */ - { 0x03190, 0x031E5 }, /* IDEOGRAPHIC ANNOTATION LINKING MARK - CJK STROKE SZP */ - { 0x031EF, 0x0321E }, /* IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION - PARENTHESIZED KOREAN CHARACTER O HU */ - { 0x03220, 0x03247 }, /* PARENTHESIZED IDEOGRAPH ONE - CIRCLED IDEOGRAPH KOTO */ - { 0x03250, 0x0A48C }, /* PARTNERSHIP SIGN - YI SYLLABLE YYR */ - { 0x0A490, 0x0A4C6 }, /* YI RADICAL QOT - YI RADICAL KE */ - { 0x0A960, 0x0A97C }, /* HANGUL CHOSEONG TIKEUT-MIEUM - HANGUL CHOSEONG SSANGYEORINHIEUH */ - { 0x0AC00, 0x0D7A3 }, /* HANGUL SYLLABLE GA - HANGUL SYLLABLE HIH */ - { 0x0F900, 0x0FAFF }, /* U+F900 - U+FAFF */ - { 0x0FE10, 0x0FE19 }, /* PRESENTATION FORM FOR VERTICAL COMMA - PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS */ - { 0x0FE30, 0x0FE52 }, /* PRESENTATION FORM FOR VERTICAL TWO DOT LEADER - SMALL FULL STOP */ - { 0x0FE54, 0x0FE66 }, /* SMALL SEMICOLON - SMALL EQUALS SIGN */ - { 0x0FE68, 0x0FE6B }, /* SMALL REVERSE SOLIDUS - SMALL COMMERCIAL AT */ - { 0x0FF01, 0x0FF60 }, /* FULLWIDTH EXCLAMATION MARK - FULLWIDTH RIGHT WHITE PARENTHESIS */ - { 0x0FFE0, 0x0FFE6 }, /* FULLWIDTH CENT SIGN - FULLWIDTH WON SIGN */ +/* Double-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct ucs_interval16 ucs_double_width_bmp_ranges[] = { + { 0x1100, 0x115F }, /* HANGUL CHOSEONG KIYEOK - HANGUL CHOSEONG FILLER */ + { 0x231A, 0x231B }, /* WATCH - HOURGLASS */ + { 0x2329, 0x232A }, /* LEFT-POINTING ANGLE BRACKET - RIGHT-POINTING ANGLE BRACKET */ + { 0x23E9, 0x23EC }, /* BLACK RIGHT-POINTING DOUBLE TRIANGLE - BLACK DOWN-POINTING DOUBLE TRIANGLE */ + { 0x23F0, 0x23F0 }, /* ALARM CLOCK */ + { 0x23F3, 0x23F3 }, /* HOURGLASS WITH FLOWING SAND */ + { 0x25FD, 0x25FE }, /* WHITE MEDIUM SMALL SQUARE - BLACK MEDIUM SMALL SQUARE */ + { 0x2614, 0x2615 }, /* UMBRELLA WITH RAIN DROPS - HOT BEVERAGE */ + { 0x2630, 0x2637 }, /* TRIGRAM FOR HEAVEN - TRIGRAM FOR EARTH */ + { 0x2648, 0x2653 }, /* ARIES - PISCES */ + { 0x267F, 0x267F }, /* WHEELCHAIR SYMBOL */ + { 0x268A, 0x268F }, /* MONOGRAM FOR YANG - DIGRAM FOR GREATER YIN */ + { 0x2693, 0x2693 }, /* ANCHOR */ + { 0x26A1, 0x26A1 }, /* HIGH VOLTAGE SIGN */ + { 0x26AA, 0x26AB }, /* MEDIUM WHITE CIRCLE - MEDIUM BLACK CIRCLE */ + { 0x26BD, 0x26BE }, /* SOCCER BALL - BASEBALL */ + { 0x26C4, 0x26C5 }, /* SNOWMAN WITHOUT SNOW - SUN BEHIND CLOUD */ + { 0x26CE, 0x26CE }, /* OPHIUCHUS */ + { 0x26D4, 0x26D4 }, /* NO ENTRY */ + { 0x26EA, 0x26EA }, /* CHURCH */ + { 0x26F2, 0x26F3 }, /* FOUNTAIN - FLAG IN HOLE */ + { 0x26F5, 0x26F5 }, /* SAILBOAT */ + { 0x26FA, 0x26FA }, /* TENT */ + { 0x26FD, 0x26FD }, /* FUEL PUMP */ + { 0x2705, 0x2705 }, /* WHITE HEAVY CHECK MARK */ + { 0x270A, 0x270B }, /* RAISED FIST - RAISED HAND */ + { 0x2728, 0x2728 }, /* SPARKLES */ + { 0x274C, 0x274C }, /* CROSS MARK */ + { 0x274E, 0x274E }, /* NEGATIVE SQUARED CROSS MARK */ + { 0x2753, 0x2755 }, /* BLACK QUESTION MARK ORNAMENT - WHITE EXCLAMATION MARK ORNAMENT */ + { 0x2757, 0x2757 }, /* HEAVY EXCLAMATION MARK SYMBOL */ + { 0x2795, 0x2797 }, /* HEAVY PLUS SIGN - HEAVY DIVISION SIGN */ + { 0x27B0, 0x27B0 }, /* CURLY LOOP */ + { 0x27BF, 0x27BF }, /* DOUBLE CURLY LOOP */ + { 0x2B1B, 0x2B1C }, /* BLACK LARGE SQUARE - WHITE LARGE SQUARE */ + { 0x2B50, 0x2B50 }, /* WHITE MEDIUM STAR */ + { 0x2B55, 0x2B55 }, /* HEAVY LARGE CIRCLE */ + { 0x2E80, 0x2E99 }, /* CJK RADICAL REPEAT - CJK RADICAL RAP */ + { 0x2E9B, 0x2EF3 }, /* CJK RADICAL CHOKE - CJK RADICAL C-SIMPLIFIED TURTLE */ + { 0x2F00, 0x2FD5 }, /* KANGXI RADICAL ONE - KANGXI RADICAL FLUTE */ + { 0x2FF0, 0x3029 }, /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT - HANGZHOU NUMERAL NINE */ + { 0x3030, 0x303E }, /* WAVY DASH - IDEOGRAPHIC VARIATION INDICATOR */ + { 0x3041, 0x3096 }, /* HIRAGANA LETTER SMALL A - HIRAGANA LETTER SMALL KE */ + { 0x309B, 0x30FF }, /* KATAKANA-HIRAGANA VOICED SOUND MARK - KATAKANA DIGRAPH KOTO */ + { 0x3105, 0x312F }, /* BOPOMOFO LETTER B - BOPOMOFO LETTER NN */ + { 0x3131, 0x318E }, /* HANGUL LETTER KIYEOK - HANGUL LETTER ARAEAE */ + { 0x3190, 0x31E5 }, /* IDEOGRAPHIC ANNOTATION LINKING MARK - CJK STROKE SZP */ + { 0x31EF, 0x321E }, /* IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION - PARENTHESIZED KOREAN CHARACTER O HU */ + { 0x3220, 0x3247 }, /* PARENTHESIZED IDEOGRAPH ONE - CIRCLED IDEOGRAPH KOTO */ + { 0x3250, 0xA48C }, /* PARTNERSHIP SIGN - YI SYLLABLE YYR */ + { 0xA490, 0xA4C6 }, /* YI RADICAL QOT - YI RADICAL KE */ + { 0xA960, 0xA97C }, /* HANGUL CHOSEONG TIKEUT-MIEUM - HANGUL CHOSEONG SSANGYEORINHIEUH */ + { 0xAC00, 0xD7A3 }, /* HANGUL SYLLABLE GA - HANGUL SYLLABLE HIH */ + { 0xF900, 0xFAFF }, /* U+F900 - U+FAFF */ + { 0xFE10, 0xFE19 }, /* PRESENTATION FORM FOR VERTICAL COMMA - PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS */ + { 0xFE30, 0xFE52 }, /* PRESENTATION FORM FOR VERTICAL TWO DOT LEADER - SMALL FULL STOP */ + { 0xFE54, 0xFE66 }, /* SMALL SEMICOLON - SMALL EQUALS SIGN */ + { 0xFE68, 0xFE6B }, /* SMALL REVERSE SOLIDUS - SMALL COMMERCIAL AT */ + { 0xFF01, 0xFF60 }, /* FULLWIDTH EXCLAMATION MARK - FULLWIDTH RIGHT WHITE PARENTHESIS */ + { 0xFFE0, 0xFFE6 }, /* FULLWIDTH CENT SIGN - FULLWIDTH WON SIGN */ +}; + +/* Double-width character ranges (non-BMP, U+10000 and above) */ +static const struct ucs_interval32 ucs_double_width_non_bmp_ranges[] = { { 0x16FE0, 0x16FE3 }, /* TANGUT ITERATION MARK - OLD CHINESE ITERATION MARK */ { 0x17000, 0x187F7 }, /* U+17000 - U+187F7 */ { 0x18800, 0x18CD5 }, /* TANGUT COMPONENT-001 - KHITAN SMALL SCRIPT CHARACTER-18CD5 */