From patchwork Thu Apr 17 18:45:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882212 Received: from fout-b5-smtp.messagingengine.com (fout-b5-smtp.messagingengine.com [202.12.124.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A6C12561D9; Thu, 17 Apr 2025 18:49:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.148 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915750; cv=none; b=qn7TjgGNPb1BuQL24cB6Zcl7dbLCDEdpOqlL72eseZ6K3n+FUaZDri6+ppeC8yA5+Mt8FtDG8SBxO8+jj1wbQdMsGAR2acJ+EB1Qnwwjv5xkJt7x9ephcNaYB7L6nwkfr/cCJJQUcVniWF/Vo+ZBSkYoyoSuyws8vCW/vEmhDjE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915750; c=relaxed/simple; bh=erlMHrM9Cs6b3GgS9nPHlqBtf3Ox19dftn3lWGx+vhM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CX2VNilku4LG6VEEXTjQ5MZmAgr92tJG5IH04NwcUe5Gyd3fRwGOwSrb8R7Dkzz88xrRC+Mlz/MSq1NgqH1Jc0tJwZY7mMX2xycJ1q69iWy/18SPeGRugI/2Vbv6RDRJjyT1YYPIg7/CR6mQgmSAZ6pfM0NquLtJGjrF8uNvhXI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=AbfmN6Ah; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=OLfs08lJ; arc=none smtp.client-ip=202.12.124.148 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="AbfmN6Ah"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="OLfs08lJ" Received: from phl-compute-04.internal (phl-compute-04.phl.internal [10.202.2.44]) by mailfout.stl.internal (Postfix) with ESMTP id 081F811400D2; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-04.internal (MEProxy); Thu, 17 Apr 2025 14:49:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915747; x= 1745002147; bh=ib4GxQsCcbLjx08EB7K5kdtLw96IO5K7n2r4f7nSpeA=; b=A bfmN6AhWjc6flNYhOheDHOrfIsI9z61iJ+TRixXKD5yQ3YKoBsrsMG8M5mF51sEv L3CBZs9CIjBev98CKsdcLbi2nQrDYvHgeF31qe3VXyrhv+F9o/tcJ53BPUP696Ez Sxvtcc/ddaUPgMUEzKPErCphgoPEilauggzFAPbJfCcwc42XPF3M9vQkLZ1MHhfW vHG7F2lIGtcl+1iekHjZCNAEBUt/Oz5BkIDvnICuyncPYbArB99AEoNSOkPckWrh 7N02t/eTv+jpVfDtalDMUK4ufTsgH9UGsIFPrFQ08025dofZi55epXvHA+GMaa5I kInbVKkHJdvVDQ11TjpYw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915747; x=1745002147; bh=i b4GxQsCcbLjx08EB7K5kdtLw96IO5K7n2r4f7nSpeA=; b=OLfs08lJy1cjWX/jf f82TkLPZXXDoqMmm3C7+BSlRr5SdTJJAwDr/2voStPilPZ7/pDfKD6Yloi8sNM3h udjrcVmC3vzmGlH4g/um7PYDDBgMRafch4Z9ShoL883XC71SVaYvretfikDInzqd zT3upCkgL+ztbanBlrNIO2uFSQ0IOpAmCkQATDav6enXfcKd9sg06ccOO2/mrhfK f52XrIfAG3iyoT1WaN8Zk1m0X3AwYEoWrl3GnzoovpGjW/i2WJMsSyvP2mx7iNcz /j3Gf9vCFK/C/cFvpaRGaKQGRQ48zBa+1Z27bv44ARu3M8k3pUNAgzHYqvoE/D5f 6DuOQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id E445B11443C9; Thu, 17 Apr 2025 14:49:06 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 01/14] vt: minor cleanup to vc_translate_unicode() Date: Thu, 17 Apr 2025 14:45:03 -0400 Message-ID: <20250417184849.475581-2-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Make it clearer when a sequence is bad. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/vt.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index f5642b3038..b5f3c8a818 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2817,7 +2817,7 @@ static int vc_translate_unicode(struct vc_data *vc, int c, bool *rescan) if ((c & 0xc0) == 0x80) { /* Unexpected continuation byte? */ if (!vc->vc_utf_count) - return 0xfffd; + goto bad_sequence; vc->vc_utf_char = (vc->vc_utf_char << 6) | (c & 0x3f); vc->vc_npar++; @@ -2829,17 +2829,17 @@ static int vc_translate_unicode(struct vc_data *vc, int c, bool *rescan) /* Reject overlong sequences */ if (c <= utf8_length_changes[vc->vc_npar - 1] || c > utf8_length_changes[vc->vc_npar]) - return 0xfffd; + goto bad_sequence; return vc_sanitize_unicode(c); } /* Single ASCII byte or first byte of a sequence received */ if (vc->vc_utf_count) { - /* Continuation byte expected */ + /* A continuation byte was expected */ *rescan = true; vc->vc_utf_count = 0; - return 0xfffd; + goto bad_sequence; } /* Nothing to do if an ASCII byte was received */ @@ -2858,11 +2858,14 @@ static int vc_translate_unicode(struct vc_data *vc, int c, bool *rescan) vc->vc_utf_count = 3; vc->vc_utf_char = (c & 0x07); } else { - return 0xfffd; + goto bad_sequence; } need_more_bytes: return -1; + +bad_sequence: + return 0xfffd; } static int vc_translate(struct vc_data *vc, int *c, bool *rescan) From patchwork Thu Apr 17 18:45:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882557 Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49A7C2566D1; Thu, 17 Apr 2025 18:49:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.155 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915751; cv=none; b=uwkI4k2egKDacW2RscVRZl8N/tRzuwhEXkx6esqDCJNtH2a88DHtar2/Euxl1Nxlukh3J7yMW24QbzzAR3rzbNkrcFmzU81pcyat8EKX4+DTgaOnPLxCBl2DPBloXzy5UYNROUiNsiOAOWkw7DmaRtRArDVXnaiHHAzkhmqppJA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915751; c=relaxed/simple; bh=QU7c5TLdc70T99bcvA6HgWM/Jzz877a1KLiBOqwlKxU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oopD2eZ0GbKh9qBVbKZY/SMqhyvyD4hpCZJDNU1u1lXQKXFTq+nKcKewUz5wSFpR8nz2j7hdX2wpAL4vIdlBVxtBT5Zy1QJXAcMsubzBipvcKSKZyQTJUKKarpVawYFsqdnFkokLfTzNIIy0Yav7II5ih22ncuurrnURvO0b43s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=dz75BCDe; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=bF5kWaaT; arc=none smtp.client-ip=202.12.124.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="dz75BCDe"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="bF5kWaaT" Received: from phl-compute-08.internal (phl-compute-08.phl.internal [10.202.2.48]) by mailfhigh.stl.internal (Postfix) with ESMTP id 2B74E2540127; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-08.internal (MEProxy); Thu, 17 Apr 2025 14:49:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915748; x= 1745002148; bh=TCJd1ihEUnVWBCs2IF3R7qLyU0rcj5kwl5KEKOzYmtk=; b=d z75BCDeobwrHkeXfwkPK478Olk1fT+jy0IVbUVebmirMu4M82f/pTmdSB6NnNY/q QhtqsK2fZA4hxXX7SqzXEMRwS34Ta21lSQWAzDQwLbdfHWLNLnhJPEAy1WzaPc6J WhTokLP/W59bPWKpBUPWXez9n55oVJewwsqFjdGl/282VK3VQHBXxR/IkKgvG3g5 F7K/9fYR17X1PS3mkz7R22BwbBR73Sci1sTn+jwfsoo5kB82mnMBpl8s0IACO7y7 xF6zRn3uchF7oIWrKlHYNsa1lw/L/BOCPecq2pFfJugbueV0t15sVEv5o9OFnNyd N9e2iAmhd0KnhzPgtOlWA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915748; x=1745002148; bh=T CJd1ihEUnVWBCs2IF3R7qLyU0rcj5kwl5KEKOzYmtk=; b=bF5kWaaTmRH0zqqso 99w3VkSXKlVdMcD2uAl9soMQtLry8N+v28+cTLXYiFttmCR3GXy0zPWhr0jQeEhu efWxyhcL639R4muQwjPUVq09MNTkvJlqD588v1JTRB4QDRvW+DU3hFajRRKLP2oj iETNhhKdAi8PyYFteB1+Y07uc1a16/iL5oJwEJNgKQPpC4AEZWLLodMV7nozsqk2 0Jecm9w4FN03GdlvAEF8RwZJC1O3g7NYcsjC4NB+oUPxXmG8w+XCkONdukve8wGF aXZjPutwv36IzAl6aC24WYDBMkNypOfJUcajDI8u6KLmXH7g7xWkcLlKnoGDNvxA yIozA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnhepleektdffjeevfefhiedtudevudetjeejvdej uedtvdevveefuedujeejffetieeknecuffhomhgrihhnpegtrghmrdgrtgdruhhknecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepnhhitghosehf lhhugihnihgtrdhnvghtpdhnsggprhgtphhtthhopeehpdhmohguvgepshhmthhpohhuth dprhgtphhtthhopehnphhithhrvgessggrhihlihgsrhgvrdgtohhmpdhrtghpthhtohep jhhirhhishhlrggshieskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepghhrvghgkhhhse hlihhnuhigfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoheplhhinhhugidqkhgv rhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinhhugidqsh gvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 0D95611443CA; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 02/14] vt: move unicode processing to a separate file Date: Thu, 17 Apr 2025 14:45:04 -0400 Message-ID: <20250417184849.475581-3-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre This will make it easier to maintain. Also make it depend on CONFIG_CONSOLE_TRANSLATIONS. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/Makefile | 3 +- drivers/tty/vt/ucs.c | 57 ++++++++++++++++++++++++++++++++++++++ drivers/tty/vt/vt.c | 40 +------------------------- include/linux/consolemap.h | 6 ++++ 4 files changed, 66 insertions(+), 40 deletions(-) create mode 100644 drivers/tty/vt/ucs.c diff --git a/drivers/tty/vt/Makefile b/drivers/tty/vt/Makefile index 2c8ce8b592..e24c8546ac 100644 --- a/drivers/tty/vt/Makefile +++ b/drivers/tty/vt/Makefile @@ -7,7 +7,8 @@ FONTMAPFILE = cp437.uni obj-$(CONFIG_VT) += vt_ioctl.o vc_screen.o \ selection.o keyboard.o \ vt.o defkeymap.o -obj-$(CONFIG_CONSOLE_TRANSLATIONS) += consolemap.o consolemap_deftbl.o +obj-$(CONFIG_CONSOLE_TRANSLATIONS) += consolemap.o consolemap_deftbl.o \ + ucs.o # Files generated that shall be removed upon make clean clean-files := consolemap_deftbl.c defkeymap.c diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c new file mode 100644 index 0000000000..dc4a6e7945 --- /dev/null +++ b/drivers/tty/vt/ucs.c @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ucs.c - Universal Character Set processing + */ + +#include +#include +#include +#include + +/* ucs_is_double_width() is based on the wcwidth() implementation by + * Markus Kuhn -- 2007-05-26 (Unicode 5.0) + * Latest version: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c + */ + +struct ucs_interval { + u32 first; + u32 last; +}; + +static const struct ucs_interval ucs_double_width_ranges[] = { + { 0x1100, 0x115F }, { 0x2329, 0x232A }, { 0x2E80, 0x303E }, + { 0x3040, 0xA4CF }, { 0xAC00, 0xD7A3 }, { 0xF900, 0xFAFF }, + { 0xFE10, 0xFE19 }, { 0xFE30, 0xFE6F }, { 0xFF00, 0xFF60 }, + { 0xFFE0, 0xFFE6 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } +}; + +static int interval_cmp(const void *key, const void *element) +{ + u32 cp = *(u32 *)key; + const struct ucs_interval *entry = element; + + if (cp < entry->first) + return -1; + if (cp > entry->last) + return 1; + return 0; +} + +/** + * ucs_is_double_width() - Determine if a Unicode code point is double-width. + * @cp: Unicode code point (UCS-4) + * + * Return: true if the character is double-width, false otherwise + */ +bool ucs_is_double_width(u32 cp) +{ + size_t size = ARRAY_SIZE(ucs_double_width_ranges); + + if (!in_range(cp, ucs_double_width_ranges[0].first, + ucs_double_width_ranges[size - 1].last)) + return false; + + return __inline_bsearch(&cp, ucs_double_width_ranges, size, + sizeof(*ucs_double_width_ranges), + interval_cmp) != NULL; +} diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index b5f3c8a818..bcb508bc15 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -104,7 +104,6 @@ #include #include #include -#include #include #define MAX_NR_CON_DRIVER 16 @@ -2712,43 +2711,6 @@ static void do_con_trol(struct tty_struct *tty, struct vc_data *vc, u8 c) } } -/* is_double_width() is based on the wcwidth() implementation by - * Markus Kuhn -- 2007-05-26 (Unicode 5.0) - * Latest version: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c - */ -struct interval { - uint32_t first; - uint32_t last; -}; - -static int ucs_cmp(const void *key, const void *elt) -{ - uint32_t ucs = *(uint32_t *)key; - struct interval e = *(struct interval *) elt; - - if (ucs > e.last) - return 1; - else if (ucs < e.first) - return -1; - return 0; -} - -static int is_double_width(uint32_t ucs) -{ - static const struct interval double_width[] = { - { 0x1100, 0x115F }, { 0x2329, 0x232A }, { 0x2E80, 0x303E }, - { 0x3040, 0xA4CF }, { 0xAC00, 0xD7A3 }, { 0xF900, 0xFAFF }, - { 0xFE10, 0xFE19 }, { 0xFE30, 0xFE6F }, { 0xFF00, 0xFF60 }, - { 0xFFE0, 0xFFE6 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } - }; - if (ucs < double_width[0].first || - ucs > double_width[ARRAY_SIZE(double_width) - 1].last) - return 0; - - return bsearch(&ucs, double_width, ARRAY_SIZE(double_width), - sizeof(struct interval), ucs_cmp) != NULL; -} - struct vc_draw_region { unsigned long from, to; int x; @@ -2953,7 +2915,7 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, bool inverse = false; if (vc->vc_utf && !vc->vc_disp_ctrl) { - if (is_double_width(c)) + if (ucs_is_double_width(c)) width = 2; } diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index c35db4896c..caf079bcb8 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -28,6 +28,7 @@ int conv_uni_to_pc(struct vc_data *conp, long ucs); u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); +bool ucs_is_double_width(uint32_t cp); #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) @@ -57,6 +58,11 @@ static inline int conv_uni_to_8bit(u32 uni) } static inline void console_map_init(void) { } + +static inline bool ucs_is_double_width(uint32_t cp) +{ + return false; +} #endif /* CONFIG_CONSOLE_TRANSLATIONS */ #endif /* __LINUX_CONSOLEMAP_H__ */ From patchwork Thu Apr 17 18:45:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882558 Received: from fout-b5-smtp.messagingengine.com (fout-b5-smtp.messagingengine.com [202.12.124.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A5F724BBFD; Thu, 17 Apr 2025 18:49:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.148 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915751; cv=none; b=S7RYU1cCp6r1LiyghIY2dVzOwen9xHEcx+9Gi0T5K6ArZFBfMHh+7SACZGvO8okX0Yujy6R1mYUHm5uqf7yd3rJ6Bxal5iIJYz3I0WeRWeQKaKnPOYn1ZXuInO3v5QQ3FzTDW6ZSp0RFSEKxISQSZBNwIQvRFO7rhUdWi7P5qaI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915751; c=relaxed/simple; bh=eCtjv4ULOHwciSXw7vzPC0pBgrfZ2RCk2OycsVuaCF8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JlFcWKOr336feQcUVFRxWwfWU/yhsb9f7DmVdBDWd4rzjBL6oN3F9KVOEkHXunsteZDX6XLOQ4azA9El2+rByw99TKI8kviYeKQtlJEk/nNIl1518GhOl+IYbMKLLRlsZxHXTwHlEjO4qII95gvPfmWKtHrq5hezHLdYzxr0DwE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=ln922AG8; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=QjJEQgoR; arc=none smtp.client-ip=202.12.124.148 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="ln922AG8"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="QjJEQgoR" Received: from phl-compute-12.internal (phl-compute-12.phl.internal [10.202.2.52]) by mailfout.stl.internal (Postfix) with ESMTP id 2168011400D7; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-12.internal (MEProxy); Thu, 17 Apr 2025 14:49:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915747; x= 1745002147; bh=nfs/yzqp0drx1eXNiOGQPpizGNcJVMSe3SYZgR9m6jo=; b=l n922AG8fLXg6lHpMHyLXxd+zCYeIb+1pGaUErZlhFKQ8QFUQEzqny4YrFgMa+/yZ SYBaXFXyKVuqr6i+D+aA4kfEr/KkmIKNPefvyKSFar/Jq6CffKNmcq4BQKKsnypi QfaIzkIBNxqftoYX5XHxuwgSbXDjajxnbftrPaY5MCqoSz9D/nP8SowYckOeqIW6 SK3P8AO0TE1ZqN7nfvx3MIWv1fcM3sgXF8ZfHHVnGHq1UBZG8Nyv68rIyTjLZ6nT UJvM5Swn2fE1iU3AkQEZuB/ipACWSmmXD/X1WLlgN0BlgX7oPHYn9YRM/YsdOQbk 43c/mQrePTVwQ0eIBXBpA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915747; x=1745002147; bh=n fs/yzqp0drx1eXNiOGQPpizGNcJVMSe3SYZgR9m6jo=; b=QjJEQgoRGwZRekxDV E7AGWOvpyNeX6oc1rz5FGJ1KIGqZwEjmvCfFKdx/GyDEPbJ+EWY/Gt6FUe/bHbGa zsDyheDyAWQ1GzNuIua1omEOPDNPnqV6rn9FPIPa8m1fMDUfOgJJelVeBF+ZoVzI TMozC0J1UuTAaURwMIHtYaAW6+hZiYGcN+jeG9jodCSXF0XghLwhpkcrWSJdOlpF Kpqb+Fm2d/avsZBCJYmGqHcj6Eq8ZyB+3MwYleVimsEdtHXqrn+0v6EhIUpiG3Yi g1IcZPIAUyyXRd4bv0LlZKsVYTEaUfG86Hggxwkxlr90optKdM5iTBbruvhtMpI2 kUoZQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 20F2311443CC; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 03/14] vt: properly support zero-width Unicode code points Date: Thu, 17 Apr 2025 14:45:05 -0400 Message-ID: <20250417184849.475581-4-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Zero-width Unicode code points are causing misalignment in vertically aligned content, disrupting the visual layout. Let's handle zero-width code points more intelligently. Double-width code points are stored in the screen grid followed by a white space code point to create the expected screen layout. When a double-width code point is followed by a zero-width code point in the console incoming bytestream (e.g., an emoji with a presentation selector) then we may replace the white space padding by that zero-width code point instead of dropping it. This maximize screen content information while preserving proper layout. If a zero-width code point is preceded by a single-width code point then the above trick is not possible and such zero-width code point must be dropped. VS16 (Variation Selector 16, U+FE0F) is special as it typically doubles the width of the preceding single-width code point. We handle that case by giving VS16 a width of 1 instead of 0 when that happens. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/vt.c | 70 ++++++++++++++++++++++++++++++++++++-- include/linux/consolemap.h | 10 ++++++ 2 files changed, 78 insertions(+), 2 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index bcb508bc15..a989feffad 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -443,6 +443,15 @@ static void vc_uniscr_scroll(struct vc_data *vc, unsigned int top, } } +static u32 vc_uniscr_getc(struct vc_data *vc, int relative_pos) +{ + int pos = vc->state.x + vc->vc_need_wrap + relative_pos; + + if (vc->vc_uni_lines && in_range(pos, 0, vc->vc_cols)) + return vc->vc_uni_lines[vc->state.y][pos]; + return 0; +} + static void vc_uniscr_copy_area(u32 **dst_lines, unsigned int dst_cols, unsigned int dst_rows, @@ -2905,6 +2914,60 @@ static bool vc_is_control(struct vc_data *vc, int tc, int c) return false; } +static void vc_con_rewind(struct vc_data *vc) +{ + if (vc->state.x && !vc->vc_need_wrap) { + vc->vc_pos -= 2; + vc->state.x--; + } + vc->vc_need_wrap = 0; +} + +#define UCS_VS16 0xfe0f /* Variation Selector 16 */ + +static int vc_process_ucs(struct vc_data *vc, int c, int *tc) +{ + u32 prev_c, curr_c = c; + + if (ucs_is_double_width(curr_c)) + return 2; + + if (!ucs_is_zero_width(curr_c)) + return 1; + + /* From here curr_c is known to be zero-width. */ + + if (ucs_is_double_width(vc_uniscr_getc(vc, -2))) { + /* + * Let's merge this zero-width code point with the preceding + * double-width code point by replacing the existing + * whitespace padding. To do so we rewind one column and + * pretend this has a width of 1. + * We give the legacy display the same initial space padding. + */ + vc_con_rewind(vc); + *tc = ' '; + return 1; + } + + /* From here the preceding character, if any, must be single-width. */ + prev_c = vc_uniscr_getc(vc, -1); + + if (curr_c == UCS_VS16 && prev_c != 0) { + /* + * VS16 (U+FE0F) is special. It typically turns the preceding + * single-width character into a double-width one. Let it + * have a width of 1 effectively making the combination with + * the preceding character double-width. + */ + *tc = ' '; + return 1; + } + + /* Otherwise zero-width code points are ignored. */ + return 0; +} + static int vc_con_write_normal(struct vc_data *vc, int tc, int c, struct vc_draw_region *draw) { @@ -2915,8 +2978,9 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, bool inverse = false; if (vc->vc_utf && !vc->vc_disp_ctrl) { - if (ucs_is_double_width(c)) - width = 2; + width = vc_process_ucs(vc, c, &tc); + if (!width) + goto out; } /* Now try to find out how to display it */ @@ -2995,6 +3059,8 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, tc = ' '; next_c = ' '; } + +out: notify_write(vc, c); if (inverse) diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index caf079bcb8..7d778752dc 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -29,6 +29,11 @@ u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); bool ucs_is_double_width(uint32_t cp); +static inline bool ucs_is_zero_width(uint32_t cp) +{ + /* coming soon */ + return false; +} #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) @@ -63,6 +68,11 @@ static inline bool ucs_is_double_width(uint32_t cp) { return false; } + +static inline bool ucs_is_zero_width(uint32_t cp) +{ + return false; +} #endif /* CONFIG_CONSOLE_TRANSLATIONS */ #endif /* __LINUX_CONSOLEMAP_H__ */ From patchwork Thu Apr 17 18:45:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882555 Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74C7C2566DB; Thu, 17 Apr 2025 18:49:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.155 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915752; cv=none; b=Px9hVwq/fwyWXYEPuS7RZC4bbKj5km1/QISduz6IXDVaZVQICuCfAd3hu7RSTm4jzcT9mXWUufstJyM+j/uRUFJVA10E+Id8MqDxIa8P8vRT/U/MfR8lWoyEwJI1DpCxC6FjKeBRHrhsz+HpPLb475bRyREK7iDbxNANv8SJN8E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915752; c=relaxed/simple; bh=uezFIhfmKfdonztMkPdc4kz6VRL4v9ZhDRwCH/I+CVM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sl9jJJa0QpcmrMFa1zdwZBB39ksbeaeutTn2mFzj0adtCtoAgYSwpBhOOtorOEPoMeL2yo35btjRfK3ANxNhP/jvPZo9GFo611eZJ/XSY9dBpZ3hOrMGyW7oaJZPxAcX81zEt4ydJ/RBLdX1afLeMq+1Akzd9V3qkLcPTDhDs1w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=JzkxByUY; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=lXzqOYmi; arc=none smtp.client-ip=202.12.124.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="JzkxByUY"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="lXzqOYmi" Received: from phl-compute-10.internal (phl-compute-10.phl.internal [10.202.2.50]) by mailfhigh.stl.internal (Postfix) with ESMTP id 64C1F2540139; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-10.internal (MEProxy); Thu, 17 Apr 2025 14:49:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915748; x= 1745002148; bh=7Hr6IGeH7WvHW1qFqnGy/DLPEWJorcGg4zCXmQZgeoQ=; b=J zkxByUYf1nRQPwyS7iIPlb3fpvC9g9RDgbpBWhQW50O7Ip8pPQstsDfcpk5t7YKE alB6Z4tHx9HzeOkGyFd6ZY2A8U+l+mlKsahI2FFyOzKsrcJqGi0RQXSTMFk1aO+7 /pPhxrpkuk7T9+54xYG8taTkCRI2UhXar1k1umY+ku0xNi08rPCN+mJXe8BRcjdE KdysJvDfQ+iQBTOTgUluG5f5I5gjLWV/sIdWJgVB4dXBUGnlo8ZcLcLwUINpTCdD woUr3vbGSNhyERVGe1BmBf6MRD11vJk+UUZ1NmPwQ5afPPegcimVMoVz2jtg8FVz P4Ju11F2Y6UETpOViZQdA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915748; x=1745002148; bh=7 Hr6IGeH7WvHW1qFqnGy/DLPEWJorcGg4zCXmQZgeoQ=; b=lXzqOYmi1iwV1pncl A8TzZY/pNChOFPMd/5bt7J+53WnhAnZR2daMhFP9EZ63oVA6O9uOsSKpKZvwubNN C9zgePQhaajyEd1Agvd/NLb4zYKQ/VcdipuUG9xnqTJ9Veg9fEmeqySzai7VTos8 SPxuRc5Zev+RYvwHoCBcPHEU86YCvzNkLw7ZcOotA1lvNhGJJKCYkmmmU0zvpmr9 2Sk4o5PJ+5uE2MtooOVOTULgzt+N5C82dASc/B6l8aHWewy4uB9pHl6c1vylenqQ W/EPqT3rBIYzwE1ZKG/DzmXabmAD8Ub6ENEZ9srJjRnSB6SMTkGaSmWdZ+5wJGuV /3eiw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnhepfedutdfhfffgleeugfeileevkeeukeejtdff leeklefhgfdttdekgfelheevhfdunecuffhomhgrihhnpehunhhitghouggvrdhorhhgpd gtshhsfihgrdhorhhgpdiffedrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgr rhgrmhepmhgrihhlfhhrohhmpehnihgtohesfhhluhignhhitgdrnhgvthdpnhgspghrtg hpthhtohephedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepnhhpihhtrhgvsegs rgihlhhisghrvgdrtghomhdprhgtphhtthhopehjihhrihhslhgrsgihsehkvghrnhgvlh drohhrghdprhgtphhtthhopehgrhgvghhkhheslhhinhhugihfohhunhgurghtihhonhdr ohhrghdprhgtphhtthhopehlihhnuhigqdhkvghrnhgvlhesvhhgvghrrdhkvghrnhgvlh drohhrghdprhgtphhtthhopehlihhnuhigqdhsvghrihgrlhesvhhgvghrrdhkvghrnhgv lhdrohhrgh X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 3F50111443CD; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 04/14] vt: introduce gen_ucs_width_table.py to create ucs_width_table.h Date: Thu, 17 Apr 2025 14:45:06 -0400 Message-ID: <20250417184849.475581-5-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre The table in ucs.c is terribly out of date and incomplete. We also need a second table to store zero-width code points. Properly maintaining those tables manually is impossible. So here's a script to generate them. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/gen_ucs_width_table.py | 256 ++++++++++++++++++++++++++ 1 file changed, 256 insertions(+) create mode 100755 drivers/tty/vt/gen_ucs_width_table.py diff --git a/drivers/tty/vt/gen_ucs_width_table.py b/drivers/tty/vt/gen_ucs_width_table.py new file mode 100755 index 0000000000..00510444a7 --- /dev/null +++ b/drivers/tty/vt/gen_ucs_width_table.py @@ -0,0 +1,256 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# Leverage Python's unicodedata module to generate ucs_width_table.h + +import unicodedata +import sys + +# This script's file name +from pathlib import Path +this_file = Path(__file__).name + +# Output file name +out_file = "ucs_width_table.h" + +# --- Global Constants for Width Assignments --- + +# Known zero-width characters +KNOWN_ZERO_WIDTH = ( + 0x200B, # ZERO WIDTH SPACE + 0x200C, # ZERO WIDTH NON-JOINER + 0x200D, # ZERO WIDTH JOINER + 0x2060, # WORD JOINER + 0xFEFF # ZERO WIDTH NO-BREAK SPACE (BOM) +) + +# Zero-width emoji modifiers and components +# NOTE: Some of these characters would normally be single-width according to +# East Asian Width properties, but we deliberately override them to be +# zero-width because they function as modifiers in emoji sequences. +EMOJI_ZERO_WIDTH = [ + # Skin tone modifiers + (0x1F3FB, 0x1F3FF), # Emoji modifiers (skin tones) + + # Variation selectors (note: VS16 is treated specially in vt.c) + (0xFE00, 0xFE0F), # Variation Selectors 1-16 + + # Gender and hair style modifiers + # These would be single-width by Unicode properties, but are zero-width + # when part of emoji + (0x2640, 0x2640), # Female sign + (0x2642, 0x2642), # Male sign + (0x26A7, 0x26A7), # Transgender symbol + (0x1F9B0, 0x1F9B3), # Hair components (red, curly, white, bald) + + # Tag characters + (0xE0020, 0xE007E), # Tags +] + +# Regional indicators (flag components) +REGIONAL_INDICATORS = (0x1F1E6, 0x1F1FF) # Regional indicator symbols A-Z + +# Double-width emoji ranges +# +# Many emoji characters are classified as single-width according to Unicode +# Standard Annex #11 East Asian Width property (N or Neutral), but we +# deliberately override them to be double-width. References: +# 1. Unicode Technical Standard #51: Unicode Emoji +# (https://www.unicode.org/reports/tr51/) +# 2. Principle of "emoji presentation" in WHATWG CSS Text specification +# (https://drafts.csswg.org/css-text-3/#character-properties) +# 3. Terminal emulator implementations (iTerm2, Windows Terminal, etc.) which +# universally render emoji as double-width characters regardless of their +# Unicode EAW property +# 4. W3C Work Item: Requirements for Japanese Text Layout - Section 3.8.1 +# Emoji width (https://www.w3.org/TR/jlreq/) +EMOJI_RANGES = [ + (0x1F000, 0x1F02F), # Mahjong Tiles (EAW: N, but displayed as double-width) + (0x1F0A0, 0x1F0FF), # Playing Cards (EAW: N, but displayed as double-width) + (0x1F300, 0x1F5FF), # Miscellaneous Symbols and Pictographs + (0x1F600, 0x1F64F), # Emoticons + (0x1F680, 0x1F6FF), # Transport and Map Symbols + (0x1F700, 0x1F77F), # Alchemical Symbols + (0x1F780, 0x1F7FF), # Geometric Shapes Extended + (0x1F800, 0x1F8FF), # Supplemental Arrows-C + (0x1F900, 0x1F9FF), # Supplemental Symbols and Pictographs + (0x1FA00, 0x1FA6F), # Chess Symbols + (0x1FA70, 0x1FAFF), # Symbols and Pictographs Extended-A +] + +def create_width_tables(): + """ + Creates Unicode character width tables and returns the data structures. + + Returns: + tuple: (zero_width_ranges, double_width_ranges) + """ + + # Width data mapping + width_map = {} # Maps code points to width (0, 1, 2) + + # Mark emoji modifiers as zero-width + for start, end in EMOJI_ZERO_WIDTH: + for cp in range(start, end + 1): + width_map[cp] = 0 + + # Mark all regional indicators as single-width as they are usually paired + # providing a combined width of 2 when displayed together. + start, end = REGIONAL_INDICATORS + for cp in range(start, end + 1): + width_map[cp] = 1 + + # Process all assigned Unicode code points (Basic Multilingual Plane + + # Supplementary Planes) Range 0x0 to 0x10FFFF (the full Unicode range) + for block_start in range(0, 0x110000, 0x1000): + block_end = block_start + 0x1000 + for cp in range(block_start, block_end): + try: + char = chr(cp) + + # Skip if already processed + if cp in width_map: + continue + + # Check for combining marks and a format characters + category = unicodedata.category(char) + + # Combining marks + if category.startswith('M'): + width_map[cp] = 0 + continue + + # Format characters + # Since we have no support for bidirectional text, all format + # characters (category Cf) can be treated with width 0 (zero) + # for simplicity, as they don't need to occupy visual space + # in a non-bidirectional text environment. + if category == 'Cf': + width_map[cp] = 0 + continue + + # Known zero-width characters + if cp in KNOWN_ZERO_WIDTH: + width_map[cp] = 0 + continue + + # Use East Asian Width property + eaw = unicodedata.east_asian_width(char) + if eaw in ('F', 'W'): # Fullwidth or Wide + width_map[cp] = 2 + elif eaw in ('Na', 'H', 'N', 'A'): # Narrow, Halfwidth, Neutral, Ambiguous + width_map[cp] = 1 + else: + # Default to single-width for unknown + width_map[cp] = 1 + + except (ValueError, OverflowError): + # Skip invalid code points + continue + + # Process Emoji - generally double-width + for start, end in EMOJI_RANGES: + for cp in range(start, end + 1): + if cp not in width_map or width_map[cp] != 0: # Don't override zero-width + try: + char = chr(cp) + width_map[cp] = 2 + except (ValueError, OverflowError): + continue + + # Optimize to create range tables + def ranges_optimize(width_data, target_width): + points = sorted([cp for cp, width in width_data.items() if width == target_width]) + if not points: + return [] + + # Group consecutive code points into ranges + ranges = [] + start = points[0] + prev = start + + for cp in points[1:]: + if cp > prev + 1: + ranges.append((start, prev)) + start = cp + prev = cp + + # Add the last range + ranges.append((start, prev)) + return ranges + + # Extract ranges for each width + zero_width_ranges = ranges_optimize(width_map, 0) + double_width_ranges = ranges_optimize(width_map, 2) + + return zero_width_ranges, double_width_ranges + +def write_tables(zero_width_ranges, double_width_ranges): + """ + Write the generated tables to C header file. + + Args: + zero_width_ranges: List of (start, end) ranges for zero-width characters + double_width_ranges: List of (start, end) ranges for double-width characters + """ + + # Function to generate code point description comments + def get_code_point_comment(start, end): + try: + start_char_desc = unicodedata.name(chr(start)) + if start == end: + return f"/* {start_char_desc} */" + else: + end_char_desc = unicodedata.name(chr(end)) + return f"/* {start_char_desc} - {end_char_desc} */" + except: + if start == end: + return f"/* U+{start:04X} */" + else: + return f"/* U+{start:04X} - U+{end:04X} */" + + # Generate C tables + with open(out_file, 'w') as f: + f.write(f"""\ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * {out_file} - Unicode character width + * + * Auto-generated by {this_file} + * + * Unicode Version: {unicodedata.unidata_version} + */ + +/* Zero-width character ranges */ +static const struct ucs_interval ucs_zero_width_ranges[] = {{ +""") + + for start, end in zero_width_ranges: + comment = get_code_point_comment(start, end) + f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") + + f.write("""\ +}; + +/* Double-width character ranges */ +static const struct ucs_interval ucs_double_width_ranges[] = { +""") + + for start, end in double_width_ranges: + comment = get_code_point_comment(start, end) + f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") + + f.write("};\n") + +if __name__ == "__main__": + # Write tables to header file + zero_width_ranges, double_width_ranges = create_width_tables() + write_tables(zero_width_ranges, double_width_ranges) + + # Print summary + zero_width_count = sum(end - start + 1 for start, end in zero_width_ranges) + double_width_count = sum(end - start + 1 for start, end in double_width_ranges) + print(f"Generated {out_file} with:") + print(f"- {len(zero_width_ranges)} zero-width ranges covering ~{zero_width_count} code points") + print(f"- {len(double_width_ranges)} double-width ranges covering ~{double_width_count} code points") + print(f"- Unicode Version: {unicodedata.unidata_version}") From patchwork Thu Apr 17 18:45:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882205 Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F33C725742A; Thu, 17 Apr 2025 18:49:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.155 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915756; cv=none; b=VQQT78uI5yfj2aBnEnWgpuVItK+/9DFRp/V7uA08laoH/W8L4arP5W6pJVdob9jFNMs3IGfxuCqYhUZzjMFjzJ1s0LZ/tLNj72beSqkh7e9YLPe8AnYsCg6pwOhNlnJi0Bc27dOFLoiGU1tK7Ee8lIvZQDyHNI/PPnRbQEKsRkw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915756; c=relaxed/simple; bh=sjl8jkhlKQfSQfkl6L3ONxlqbOrwlV8WeABCa+ECIDU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F1hOMBtUE2poXV7NoWA9VfmgIcZ6D0dOUe/OBojzCMjnACQkROkbS0/moCAPX59XSDsjtGMHJqC9WW/KjWWAc/OaDUYmsFY8h70Sk31bLSjkwzvhPGA5lPIzJzwsIwN+gdp0kvt9UpkCb2IFN2dqoYLH6TfgDXkcVbEuwLYbfWM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=DEHXGwfO; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=rYjgHXFI; arc=none smtp.client-ip=202.12.124.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="DEHXGwfO"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="rYjgHXFI" Received: from phl-compute-05.internal (phl-compute-05.phl.internal [10.202.2.45]) by mailfhigh.stl.internal (Postfix) with ESMTP id 1E6AF2540158; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-05.internal (MEProxy); Thu, 17 Apr 2025 14:49:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915748; x= 1745002148; bh=7/i7WiriMfWbLSlzAmCs65r/PDEYMdzA9iqeuqL18o4=; b=D EHXGwfO2Re0Xd2yqw7WiRXV4MwwhrfQL/7zFZLiPnBnviQ8agzvuyeFWQo5k0hsR exik+xI06VqLtBKhrhqNCDmlZs0fB8yyNXcTLMClovyqYN3EH3hU2MFMsM3RpcC1 HIEInrTem3bAyY08mll/11x9BwbCgDU/PDX2Ybmv575txy91ao40EKmStI1APAMC J6UKXKVK2nPnhnEV/psCgy0KGP45HJ6O8d+k8XSrUyYhOFq0/NLtBDo0XKJcTq+8 gE8geKa3G9/3w4KP5HFrffTtRIt/xRcHZVY8wIYvp8eDYIA6AMxld9gZW/PpRdLs sd3QYm5pkxwkXU6g/nLgw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915748; x=1745002148; bh=7 /i7WiriMfWbLSlzAmCs65r/PDEYMdzA9iqeuqL18o4=; b=rYjgHXFI5p02iorhY AEMMiriSdyeOz3IEcizIx/lcAoSODri9IqkCb0pbOqUPUJMQz7wx9ubmSNKFPHXK b0FihrZK3HBV8jeKefFXnOs/nPuYLW+YWGXdYdS/tAIPlo6z/7ESO9BTmRI9+bnw /QyQcGiq9Z7YblV6E/XhLfJrplKVDx2EFokfBD+iYODaZ6XD8dr8S+fBz+W2Lbd0 DJBN2pKndxRpRQtWeCLURZJs2wPf+M7ChRJEeC7hIYjDizSdTdIHZ8YAAQlsJQmy 3B6exhCApm0gER134CjaI+/G3kgOY+/UKOcC5+tYpTgXsokkUD7zyp5uWR9kD0UK WzaNQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 67EF011443CF; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 05/14] vt: create ucs_width_table.h with gen_ucs_width_table.py Date: Thu, 17 Apr 2025 14:45:07 -0400 Message-ID: <20250417184849.475581-6-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Provide comprehensive ranges for double-width and zero-width Unicode code points. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/ucs_width_table.h | 445 +++++++++++++++++++++++++++++++ 1 file changed, 445 insertions(+) create mode 100644 drivers/tty/vt/ucs_width_table.h diff --git a/drivers/tty/vt/ucs_width_table.h b/drivers/tty/vt/ucs_width_table.h new file mode 100644 index 0000000000..9cc86b5cdf --- /dev/null +++ b/drivers/tty/vt/ucs_width_table.h @@ -0,0 +1,445 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * ucs_width_table.h - Unicode character width + * + * Auto-generated by gen_ucs_width_table.py + * + * Unicode Version: 16.0.0 + */ + +/* Zero-width character ranges */ +static const struct ucs_interval ucs_zero_width_ranges[] = { + { 0x000AD, 0x000AD }, /* SOFT HYPHEN */ + { 0x00300, 0x0036F }, /* COMBINING GRAVE ACCENT - COMBINING LATIN SMALL LETTER X */ + { 0x00483, 0x00489 }, /* COMBINING CYRILLIC TITLO - COMBINING CYRILLIC MILLIONS SIGN */ + { 0x00591, 0x005BD }, /* HEBREW ACCENT ETNAHTA - HEBREW POINT METEG */ + { 0x005BF, 0x005BF }, /* HEBREW POINT RAFE */ + { 0x005C1, 0x005C2 }, /* HEBREW POINT SHIN DOT - HEBREW POINT SIN DOT */ + { 0x005C4, 0x005C5 }, /* HEBREW MARK UPPER DOT - HEBREW MARK LOWER DOT */ + { 0x005C7, 0x005C7 }, /* HEBREW POINT QAMATS QATAN */ + { 0x00600, 0x00605 }, /* ARABIC NUMBER SIGN - ARABIC NUMBER MARK ABOVE */ + { 0x00610, 0x0061A }, /* ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM - ARABIC SMALL KASRA */ + { 0x0061C, 0x0061C }, /* ARABIC LETTER MARK */ + { 0x0064B, 0x0065F }, /* ARABIC FATHATAN - ARABIC WAVY HAMZA BELOW */ + { 0x00670, 0x00670 }, /* ARABIC LETTER SUPERSCRIPT ALEF */ + { 0x006D6, 0x006DD }, /* ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA - ARABIC END OF AYAH */ + { 0x006DF, 0x006E4 }, /* ARABIC SMALL HIGH ROUNDED ZERO - ARABIC SMALL HIGH MADDA */ + { 0x006E7, 0x006E8 }, /* ARABIC SMALL HIGH YEH - ARABIC SMALL HIGH NOON */ + { 0x006EA, 0x006ED }, /* ARABIC EMPTY CENTRE LOW STOP - ARABIC SMALL LOW MEEM */ + { 0x0070F, 0x0070F }, /* SYRIAC ABBREVIATION MARK */ + { 0x00711, 0x00711 }, /* SYRIAC LETTER SUPERSCRIPT ALAPH */ + { 0x00730, 0x0074A }, /* SYRIAC PTHAHA ABOVE - SYRIAC BARREKH */ + { 0x007A6, 0x007B0 }, /* THAANA ABAFILI - THAANA SUKUN */ + { 0x007EB, 0x007F3 }, /* NKO COMBINING SHORT HIGH TONE - NKO COMBINING DOUBLE DOT ABOVE */ + { 0x007FD, 0x007FD }, /* NKO DANTAYALAN */ + { 0x00816, 0x00819 }, /* SAMARITAN MARK IN - SAMARITAN MARK DAGESH */ + { 0x0081B, 0x00823 }, /* SAMARITAN MARK EPENTHETIC YUT - SAMARITAN VOWEL SIGN A */ + { 0x00825, 0x00827 }, /* SAMARITAN VOWEL SIGN SHORT A - SAMARITAN VOWEL SIGN U */ + { 0x00829, 0x0082D }, /* SAMARITAN VOWEL SIGN LONG I - SAMARITAN MARK NEQUDAA */ + { 0x00859, 0x0085B }, /* MANDAIC AFFRICATION MARK - MANDAIC GEMINATION MARK */ + { 0x00890, 0x00891 }, /* ARABIC POUND MARK ABOVE - ARABIC PIASTRE MARK ABOVE */ + { 0x00897, 0x0089F }, /* ARABIC PEPET - ARABIC HALF MADDA OVER MADDA */ + { 0x008CA, 0x00903 }, /* ARABIC SMALL HIGH FARSI YEH - DEVANAGARI SIGN VISARGA */ + { 0x0093A, 0x0093C }, /* DEVANAGARI VOWEL SIGN OE - DEVANAGARI SIGN NUKTA */ + { 0x0093E, 0x0094F }, /* DEVANAGARI VOWEL SIGN AA - DEVANAGARI VOWEL SIGN AW */ + { 0x00951, 0x00957 }, /* DEVANAGARI STRESS SIGN UDATTA - DEVANAGARI VOWEL SIGN UUE */ + { 0x00962, 0x00963 }, /* DEVANAGARI VOWEL SIGN VOCALIC L - DEVANAGARI VOWEL SIGN VOCALIC LL */ + { 0x00981, 0x00983 }, /* BENGALI SIGN CANDRABINDU - BENGALI SIGN VISARGA */ + { 0x009BC, 0x009BC }, /* BENGALI SIGN NUKTA */ + { 0x009BE, 0x009C4 }, /* BENGALI VOWEL SIGN AA - BENGALI VOWEL SIGN VOCALIC RR */ + { 0x009C7, 0x009C8 }, /* BENGALI VOWEL SIGN E - BENGALI VOWEL SIGN AI */ + { 0x009CB, 0x009CD }, /* BENGALI VOWEL SIGN O - BENGALI SIGN VIRAMA */ + { 0x009D7, 0x009D7 }, /* BENGALI AU LENGTH MARK */ + { 0x009E2, 0x009E3 }, /* BENGALI VOWEL SIGN VOCALIC L - BENGALI VOWEL SIGN VOCALIC LL */ + { 0x009FE, 0x009FE }, /* BENGALI SANDHI MARK */ + { 0x00A01, 0x00A03 }, /* GURMUKHI SIGN ADAK BINDI - GURMUKHI SIGN VISARGA */ + { 0x00A3C, 0x00A3C }, /* GURMUKHI SIGN NUKTA */ + { 0x00A3E, 0x00A42 }, /* GURMUKHI VOWEL SIGN AA - GURMUKHI VOWEL SIGN UU */ + { 0x00A47, 0x00A48 }, /* GURMUKHI VOWEL SIGN EE - GURMUKHI VOWEL SIGN AI */ + { 0x00A4B, 0x00A4D }, /* GURMUKHI VOWEL SIGN OO - GURMUKHI SIGN VIRAMA */ + { 0x00A51, 0x00A51 }, /* GURMUKHI SIGN UDAAT */ + { 0x00A70, 0x00A71 }, /* GURMUKHI TIPPI - GURMUKHI ADDAK */ + { 0x00A75, 0x00A75 }, /* GURMUKHI SIGN YAKASH */ + { 0x00A81, 0x00A83 }, /* GUJARATI SIGN CANDRABINDU - GUJARATI SIGN VISARGA */ + { 0x00ABC, 0x00ABC }, /* GUJARATI SIGN NUKTA */ + { 0x00ABE, 0x00AC5 }, /* GUJARATI VOWEL SIGN AA - GUJARATI VOWEL SIGN CANDRA E */ + { 0x00AC7, 0x00AC9 }, /* GUJARATI VOWEL SIGN E - GUJARATI VOWEL SIGN CANDRA O */ + { 0x00ACB, 0x00ACD }, /* GUJARATI VOWEL SIGN O - GUJARATI SIGN VIRAMA */ + { 0x00AE2, 0x00AE3 }, /* GUJARATI VOWEL SIGN VOCALIC L - GUJARATI VOWEL SIGN VOCALIC LL */ + { 0x00AFA, 0x00AFF }, /* GUJARATI SIGN SUKUN - GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE */ + { 0x00B01, 0x00B03 }, /* ORIYA SIGN CANDRABINDU - ORIYA SIGN VISARGA */ + { 0x00B3C, 0x00B3C }, /* ORIYA SIGN NUKTA */ + { 0x00B3E, 0x00B44 }, /* ORIYA VOWEL SIGN AA - ORIYA VOWEL SIGN VOCALIC RR */ + { 0x00B47, 0x00B48 }, /* ORIYA VOWEL SIGN E - ORIYA VOWEL SIGN AI */ + { 0x00B4B, 0x00B4D }, /* ORIYA VOWEL SIGN O - ORIYA SIGN VIRAMA */ + { 0x00B55, 0x00B57 }, /* ORIYA SIGN OVERLINE - ORIYA AU LENGTH MARK */ + { 0x00B62, 0x00B63 }, /* ORIYA VOWEL SIGN VOCALIC L - ORIYA VOWEL SIGN VOCALIC LL */ + { 0x00B82, 0x00B82 }, /* TAMIL SIGN ANUSVARA */ + { 0x00BBE, 0x00BC2 }, /* TAMIL VOWEL SIGN AA - TAMIL VOWEL SIGN UU */ + { 0x00BC6, 0x00BC8 }, /* TAMIL VOWEL SIGN E - TAMIL VOWEL SIGN AI */ + { 0x00BCA, 0x00BCD }, /* TAMIL VOWEL SIGN O - TAMIL SIGN VIRAMA */ + { 0x00BD7, 0x00BD7 }, /* TAMIL AU LENGTH MARK */ + { 0x00C00, 0x00C04 }, /* TELUGU SIGN COMBINING CANDRABINDU ABOVE - TELUGU SIGN COMBINING ANUSVARA ABOVE */ + { 0x00C3C, 0x00C3C }, /* TELUGU SIGN NUKTA */ + { 0x00C3E, 0x00C44 }, /* TELUGU VOWEL SIGN AA - TELUGU VOWEL SIGN VOCALIC RR */ + { 0x00C46, 0x00C48 }, /* TELUGU VOWEL SIGN E - TELUGU VOWEL SIGN AI */ + { 0x00C4A, 0x00C4D }, /* TELUGU VOWEL SIGN O - TELUGU SIGN VIRAMA */ + { 0x00C55, 0x00C56 }, /* TELUGU LENGTH MARK - TELUGU AI LENGTH MARK */ + { 0x00C62, 0x00C63 }, /* TELUGU VOWEL SIGN VOCALIC L - TELUGU VOWEL SIGN VOCALIC LL */ + { 0x00C81, 0x00C83 }, /* KANNADA SIGN CANDRABINDU - KANNADA SIGN VISARGA */ + { 0x00CBC, 0x00CBC }, /* KANNADA SIGN NUKTA */ + { 0x00CBE, 0x00CC4 }, /* KANNADA VOWEL SIGN AA - KANNADA VOWEL SIGN VOCALIC RR */ + { 0x00CC6, 0x00CC8 }, /* KANNADA VOWEL SIGN E - KANNADA VOWEL SIGN AI */ + { 0x00CCA, 0x00CCD }, /* KANNADA VOWEL SIGN O - KANNADA SIGN VIRAMA */ + { 0x00CD5, 0x00CD6 }, /* KANNADA LENGTH MARK - KANNADA AI LENGTH MARK */ + { 0x00CE2, 0x00CE3 }, /* KANNADA VOWEL SIGN VOCALIC L - KANNADA VOWEL SIGN VOCALIC LL */ + { 0x00CF3, 0x00CF3 }, /* KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT */ + { 0x00D00, 0x00D03 }, /* MALAYALAM SIGN COMBINING ANUSVARA ABOVE - MALAYALAM SIGN VISARGA */ + { 0x00D3B, 0x00D3C }, /* MALAYALAM SIGN VERTICAL BAR VIRAMA - MALAYALAM SIGN CIRCULAR VIRAMA */ + { 0x00D3E, 0x00D44 }, /* MALAYALAM VOWEL SIGN AA - MALAYALAM VOWEL SIGN VOCALIC RR */ + { 0x00D46, 0x00D48 }, /* MALAYALAM VOWEL SIGN E - MALAYALAM VOWEL SIGN AI */ + { 0x00D4A, 0x00D4D }, /* MALAYALAM VOWEL SIGN O - MALAYALAM SIGN VIRAMA */ + { 0x00D57, 0x00D57 }, /* MALAYALAM AU LENGTH MARK */ + { 0x00D62, 0x00D63 }, /* MALAYALAM VOWEL SIGN VOCALIC L - MALAYALAM VOWEL SIGN VOCALIC LL */ + { 0x00D81, 0x00D83 }, /* SINHALA SIGN CANDRABINDU - SINHALA SIGN VISARGAYA */ + { 0x00DCA, 0x00DCA }, /* SINHALA SIGN AL-LAKUNA */ + { 0x00DCF, 0x00DD4 }, /* SINHALA VOWEL SIGN AELA-PILLA - SINHALA VOWEL SIGN KETTI PAA-PILLA */ + { 0x00DD6, 0x00DD6 }, /* SINHALA VOWEL SIGN DIGA PAA-PILLA */ + { 0x00DD8, 0x00DDF }, /* SINHALA VOWEL SIGN GAETTA-PILLA - SINHALA VOWEL SIGN GAYANUKITTA */ + { 0x00DF2, 0x00DF3 }, /* SINHALA VOWEL SIGN DIGA GAETTA-PILLA - SINHALA VOWEL SIGN DIGA GAYANUKITTA */ + { 0x00E31, 0x00E31 }, /* THAI CHARACTER MAI HAN-AKAT */ + { 0x00E34, 0x00E3A }, /* THAI CHARACTER SARA I - THAI CHARACTER PHINTHU */ + { 0x00E47, 0x00E4E }, /* THAI CHARACTER MAITAIKHU - THAI CHARACTER YAMAKKAN */ + { 0x00EB1, 0x00EB1 }, /* LAO VOWEL SIGN MAI KAN */ + { 0x00EB4, 0x00EBC }, /* LAO VOWEL SIGN I - LAO SEMIVOWEL SIGN LO */ + { 0x00EC8, 0x00ECE }, /* LAO TONE MAI EK - LAO YAMAKKAN */ + { 0x00F18, 0x00F19 }, /* TIBETAN ASTROLOGICAL SIGN -KHYUD PA - TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS */ + { 0x00F35, 0x00F35 }, /* TIBETAN MARK NGAS BZUNG NYI ZLA */ + { 0x00F37, 0x00F37 }, /* TIBETAN MARK NGAS BZUNG SGOR RTAGS */ + { 0x00F39, 0x00F39 }, /* TIBETAN MARK TSA -PHRU */ + { 0x00F3E, 0x00F3F }, /* TIBETAN SIGN YAR TSHES - TIBETAN SIGN MAR TSHES */ + { 0x00F71, 0x00F84 }, /* TIBETAN VOWEL SIGN AA - TIBETAN MARK HALANTA */ + { 0x00F86, 0x00F87 }, /* TIBETAN SIGN LCI RTAGS - TIBETAN SIGN YANG RTAGS */ + { 0x00F8D, 0x00F97 }, /* TIBETAN SUBJOINED SIGN LCE TSA CAN - TIBETAN SUBJOINED LETTER JA */ + { 0x00F99, 0x00FBC }, /* TIBETAN SUBJOINED LETTER NYA - TIBETAN SUBJOINED LETTER FIXED-FORM RA */ + { 0x00FC6, 0x00FC6 }, /* TIBETAN SYMBOL PADMA GDAN */ + { 0x0102B, 0x0103E }, /* MYANMAR VOWEL SIGN TALL AA - MYANMAR CONSONANT SIGN MEDIAL HA */ + { 0x01056, 0x01059 }, /* MYANMAR VOWEL SIGN VOCALIC R - MYANMAR VOWEL SIGN VOCALIC LL */ + { 0x0105E, 0x01060 }, /* MYANMAR CONSONANT SIGN MON MEDIAL NA - MYANMAR CONSONANT SIGN MON MEDIAL LA */ + { 0x01062, 0x01064 }, /* MYANMAR VOWEL SIGN SGAW KAREN EU - MYANMAR TONE MARK SGAW KAREN KE PHO */ + { 0x01067, 0x0106D }, /* MYANMAR VOWEL SIGN WESTERN PWO KAREN EU - MYANMAR SIGN WESTERN PWO KAREN TONE-5 */ + { 0x01071, 0x01074 }, /* MYANMAR VOWEL SIGN GEBA KAREN I - MYANMAR VOWEL SIGN KAYAH EE */ + { 0x01082, 0x0108D }, /* MYANMAR CONSONANT SIGN SHAN MEDIAL WA - MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE */ + { 0x0108F, 0x0108F }, /* MYANMAR SIGN RUMAI PALAUNG TONE-5 */ + { 0x0109A, 0x0109D }, /* MYANMAR SIGN KHAMTI TONE-1 - MYANMAR VOWEL SIGN AITON AI */ + { 0x0135D, 0x0135F }, /* ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK - ETHIOPIC COMBINING GEMINATION MARK */ + { 0x01712, 0x01715 }, /* TAGALOG VOWEL SIGN I - TAGALOG SIGN PAMUDPOD */ + { 0x01732, 0x01734 }, /* HANUNOO VOWEL SIGN I - HANUNOO SIGN PAMUDPOD */ + { 0x01752, 0x01753 }, /* BUHID VOWEL SIGN I - BUHID VOWEL SIGN U */ + { 0x01772, 0x01773 }, /* TAGBANWA VOWEL SIGN I - TAGBANWA VOWEL SIGN U */ + { 0x017B4, 0x017D3 }, /* KHMER VOWEL INHERENT AQ - KHMER SIGN BATHAMASAT */ + { 0x017DD, 0x017DD }, /* KHMER SIGN ATTHACAN */ + { 0x0180B, 0x0180F }, /* MONGOLIAN FREE VARIATION SELECTOR ONE - MONGOLIAN FREE VARIATION SELECTOR FOUR */ + { 0x01885, 0x01886 }, /* MONGOLIAN LETTER ALI GALI BALUDA - MONGOLIAN LETTER ALI GALI THREE BALUDA */ + { 0x018A9, 0x018A9 }, /* MONGOLIAN LETTER ALI GALI DAGALGA */ + { 0x01920, 0x0192B }, /* LIMBU VOWEL SIGN A - LIMBU SUBJOINED LETTER WA */ + { 0x01930, 0x0193B }, /* LIMBU SMALL LETTER KA - LIMBU SIGN SA-I */ + { 0x01A17, 0x01A1B }, /* BUGINESE VOWEL SIGN I - BUGINESE VOWEL SIGN AE */ + { 0x01A55, 0x01A5E }, /* TAI THAM CONSONANT SIGN MEDIAL RA - TAI THAM CONSONANT SIGN SA */ + { 0x01A60, 0x01A7C }, /* TAI THAM SIGN SAKOT - TAI THAM SIGN KHUEN-LUE KARAN */ + { 0x01A7F, 0x01A7F }, /* TAI THAM COMBINING CRYPTOGRAMMIC DOT */ + { 0x01AB0, 0x01ACE }, /* COMBINING DOUBLED CIRCUMFLEX ACCENT - COMBINING LATIN SMALL LETTER INSULAR T */ + { 0x01B00, 0x01B04 }, /* BALINESE SIGN ULU RICEM - BALINESE SIGN BISAH */ + { 0x01B34, 0x01B44 }, /* BALINESE SIGN REREKAN - BALINESE ADEG ADEG */ + { 0x01B6B, 0x01B73 }, /* BALINESE MUSICAL SYMBOL COMBINING TEGEH - BALINESE MUSICAL SYMBOL COMBINING GONG */ + { 0x01B80, 0x01B82 }, /* SUNDANESE SIGN PANYECEK - SUNDANESE SIGN PANGWISAD */ + { 0x01BA1, 0x01BAD }, /* SUNDANESE CONSONANT SIGN PAMINGKAL - SUNDANESE CONSONANT SIGN PASANGAN WA */ + { 0x01BE6, 0x01BF3 }, /* BATAK SIGN TOMPI - BATAK PANONGONAN */ + { 0x01C24, 0x01C37 }, /* LEPCHA SUBJOINED LETTER YA - LEPCHA SIGN NUKTA */ + { 0x01CD0, 0x01CD2 }, /* VEDIC TONE KARSHANA - VEDIC TONE PRENKHA */ + { 0x01CD4, 0x01CE8 }, /* VEDIC SIGN YAJURVEDIC MIDLINE SVARITA - VEDIC SIGN VISARGA ANUDATTA WITH TAIL */ + { 0x01CED, 0x01CED }, /* VEDIC SIGN TIRYAK */ + { 0x01CF4, 0x01CF4 }, /* VEDIC TONE CANDRA ABOVE */ + { 0x01CF7, 0x01CF9 }, /* VEDIC SIGN ATIKRAMA - VEDIC TONE DOUBLE RING ABOVE */ + { 0x01DC0, 0x01DFF }, /* COMBINING DOTTED GRAVE ACCENT - COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW */ + { 0x0200B, 0x0200F }, /* ZERO WIDTH SPACE - RIGHT-TO-LEFT MARK */ + { 0x0202A, 0x0202E }, /* LEFT-TO-RIGHT EMBEDDING - RIGHT-TO-LEFT OVERRIDE */ + { 0x02060, 0x02064 }, /* WORD JOINER - INVISIBLE PLUS */ + { 0x02066, 0x0206F }, /* LEFT-TO-RIGHT ISOLATE - NOMINAL DIGIT SHAPES */ + { 0x020D0, 0x020F0 }, /* COMBINING LEFT HARPOON ABOVE - COMBINING ASTERISK ABOVE */ + { 0x02640, 0x02640 }, /* FEMALE SIGN */ + { 0x02642, 0x02642 }, /* MALE SIGN */ + { 0x026A7, 0x026A7 }, /* MALE WITH STROKE AND MALE AND FEMALE SIGN */ + { 0x02CEF, 0x02CF1 }, /* COPTIC COMBINING NI ABOVE - COPTIC COMBINING SPIRITUS LENIS */ + { 0x02D7F, 0x02D7F }, /* TIFINAGH CONSONANT JOINER */ + { 0x02DE0, 0x02DFF }, /* COMBINING CYRILLIC LETTER BE - COMBINING CYRILLIC LETTER IOTIFIED BIG YUS */ + { 0x0302A, 0x0302F }, /* IDEOGRAPHIC LEVEL TONE MARK - HANGUL DOUBLE DOT TONE MARK */ + { 0x03099, 0x0309A }, /* COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK - COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK */ + { 0x0A66F, 0x0A672 }, /* COMBINING CYRILLIC VZMET - COMBINING CYRILLIC THOUSAND MILLIONS SIGN */ + { 0x0A674, 0x0A67D }, /* COMBINING CYRILLIC LETTER UKRAINIAN IE - COMBINING CYRILLIC PAYEROK */ + { 0x0A69E, 0x0A69F }, /* COMBINING CYRILLIC LETTER EF - COMBINING CYRILLIC LETTER IOTIFIED E */ + { 0x0A6F0, 0x0A6F1 }, /* BAMUM COMBINING MARK KOQNDON - BAMUM COMBINING MARK TUKWENTIS */ + { 0x0A802, 0x0A802 }, /* SYLOTI NAGRI SIGN DVISVARA */ + { 0x0A806, 0x0A806 }, /* SYLOTI NAGRI SIGN HASANTA */ + { 0x0A80B, 0x0A80B }, /* SYLOTI NAGRI SIGN ANUSVARA */ + { 0x0A823, 0x0A827 }, /* SYLOTI NAGRI VOWEL SIGN A - SYLOTI NAGRI VOWEL SIGN OO */ + { 0x0A82C, 0x0A82C }, /* SYLOTI NAGRI SIGN ALTERNATE HASANTA */ + { 0x0A880, 0x0A881 }, /* SAURASHTRA SIGN ANUSVARA - SAURASHTRA SIGN VISARGA */ + { 0x0A8B4, 0x0A8C5 }, /* SAURASHTRA CONSONANT SIGN HAARU - SAURASHTRA SIGN CANDRABINDU */ + { 0x0A8E0, 0x0A8F1 }, /* COMBINING DEVANAGARI DIGIT ZERO - COMBINING DEVANAGARI SIGN AVAGRAHA */ + { 0x0A8FF, 0x0A8FF }, /* DEVANAGARI VOWEL SIGN AY */ + { 0x0A926, 0x0A92D }, /* KAYAH LI VOWEL UE - KAYAH LI TONE CALYA PLOPHU */ + { 0x0A947, 0x0A953 }, /* REJANG VOWEL SIGN I - REJANG VIRAMA */ + { 0x0A980, 0x0A983 }, /* JAVANESE SIGN PANYANGGA - JAVANESE SIGN WIGNYAN */ + { 0x0A9B3, 0x0A9C0 }, /* JAVANESE SIGN CECAK TELU - JAVANESE PANGKON */ + { 0x0A9E5, 0x0A9E5 }, /* MYANMAR SIGN SHAN SAW */ + { 0x0AA29, 0x0AA36 }, /* CHAM VOWEL SIGN AA - CHAM CONSONANT SIGN WA */ + { 0x0AA43, 0x0AA43 }, /* CHAM CONSONANT SIGN FINAL NG */ + { 0x0AA4C, 0x0AA4D }, /* CHAM CONSONANT SIGN FINAL M - CHAM CONSONANT SIGN FINAL H */ + { 0x0AA7B, 0x0AA7D }, /* MYANMAR SIGN PAO KAREN TONE - MYANMAR SIGN TAI LAING TONE-5 */ + { 0x0AAB0, 0x0AAB0 }, /* TAI VIET MAI KANG */ + { 0x0AAB2, 0x0AAB4 }, /* TAI VIET VOWEL I - TAI VIET VOWEL U */ + { 0x0AAB7, 0x0AAB8 }, /* TAI VIET MAI KHIT - TAI VIET VOWEL IA */ + { 0x0AABE, 0x0AABF }, /* TAI VIET VOWEL AM - TAI VIET TONE MAI EK */ + { 0x0AAC1, 0x0AAC1 }, /* TAI VIET TONE MAI THO */ + { 0x0AAEB, 0x0AAEF }, /* MEETEI MAYEK VOWEL SIGN II - MEETEI MAYEK VOWEL SIGN AAU */ + { 0x0AAF5, 0x0AAF6 }, /* MEETEI MAYEK VOWEL SIGN VISARGA - MEETEI MAYEK VIRAMA */ + { 0x0ABE3, 0x0ABEA }, /* MEETEI MAYEK VOWEL SIGN ONAP - MEETEI MAYEK VOWEL SIGN NUNG */ + { 0x0ABEC, 0x0ABED }, /* MEETEI MAYEK LUM IYEK - MEETEI MAYEK APUN IYEK */ + { 0x0FB1E, 0x0FB1E }, /* HEBREW POINT JUDEO-SPANISH VARIKA */ + { 0x0FE00, 0x0FE0F }, /* VARIATION SELECTOR-1 - VARIATION SELECTOR-16 */ + { 0x0FE20, 0x0FE2F }, /* COMBINING LIGATURE LEFT HALF - COMBINING CYRILLIC TITLO RIGHT HALF */ + { 0x0FEFF, 0x0FEFF }, /* ZERO WIDTH NO-BREAK SPACE */ + { 0x0FFF9, 0x0FFFB }, /* INTERLINEAR ANNOTATION ANCHOR - INTERLINEAR ANNOTATION TERMINATOR */ + { 0x101FD, 0x101FD }, /* PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE */ + { 0x102E0, 0x102E0 }, /* COPTIC EPACT THOUSANDS MARK */ + { 0x10376, 0x1037A }, /* COMBINING OLD PERMIC LETTER AN - COMBINING OLD PERMIC LETTER SII */ + { 0x10A01, 0x10A03 }, /* KHAROSHTHI VOWEL SIGN I - KHAROSHTHI VOWEL SIGN VOCALIC R */ + { 0x10A05, 0x10A06 }, /* KHAROSHTHI VOWEL SIGN E - KHAROSHTHI VOWEL SIGN O */ + { 0x10A0C, 0x10A0F }, /* KHAROSHTHI VOWEL LENGTH MARK - KHAROSHTHI SIGN VISARGA */ + { 0x10A38, 0x10A3A }, /* KHAROSHTHI SIGN BAR ABOVE - KHAROSHTHI SIGN DOT BELOW */ + { 0x10A3F, 0x10A3F }, /* KHAROSHTHI VIRAMA */ + { 0x10AE5, 0x10AE6 }, /* MANICHAEAN ABBREVIATION MARK ABOVE - MANICHAEAN ABBREVIATION MARK BELOW */ + { 0x10D24, 0x10D27 }, /* HANIFI ROHINGYA SIGN HARBAHAY - HANIFI ROHINGYA SIGN TASSI */ + { 0x10D69, 0x10D6D }, /* GARAY VOWEL SIGN E - GARAY CONSONANT NASALIZATION MARK */ + { 0x10EAB, 0x10EAC }, /* YEZIDI COMBINING HAMZA MARK - YEZIDI COMBINING MADDA MARK */ + { 0x10EFC, 0x10EFF }, /* ARABIC COMBINING ALEF OVERLAY - ARABIC SMALL LOW WORD MADDA */ + { 0x10F46, 0x10F50 }, /* SOGDIAN COMBINING DOT BELOW - SOGDIAN COMBINING STROKE BELOW */ + { 0x10F82, 0x10F85 }, /* OLD UYGHUR COMBINING DOT ABOVE - OLD UYGHUR COMBINING TWO DOTS BELOW */ + { 0x11000, 0x11002 }, /* BRAHMI SIGN CANDRABINDU - BRAHMI SIGN VISARGA */ + { 0x11038, 0x11046 }, /* BRAHMI VOWEL SIGN AA - BRAHMI VIRAMA */ + { 0x11070, 0x11070 }, /* BRAHMI SIGN OLD TAMIL VIRAMA */ + { 0x11073, 0x11074 }, /* BRAHMI VOWEL SIGN OLD TAMIL SHORT E - BRAHMI VOWEL SIGN OLD TAMIL SHORT O */ + { 0x1107F, 0x11082 }, /* BRAHMI NUMBER JOINER - KAITHI SIGN VISARGA */ + { 0x110B0, 0x110BA }, /* KAITHI VOWEL SIGN AA - KAITHI SIGN NUKTA */ + { 0x110BD, 0x110BD }, /* KAITHI NUMBER SIGN */ + { 0x110C2, 0x110C2 }, /* KAITHI VOWEL SIGN VOCALIC R */ + { 0x110CD, 0x110CD }, /* KAITHI NUMBER SIGN ABOVE */ + { 0x11100, 0x11102 }, /* CHAKMA SIGN CANDRABINDU - CHAKMA SIGN VISARGA */ + { 0x11127, 0x11134 }, /* CHAKMA VOWEL SIGN A - CHAKMA MAAYYAA */ + { 0x11145, 0x11146 }, /* CHAKMA VOWEL SIGN AA - CHAKMA VOWEL SIGN EI */ + { 0x11173, 0x11173 }, /* MAHAJANI SIGN NUKTA */ + { 0x11180, 0x11182 }, /* SHARADA SIGN CANDRABINDU - SHARADA SIGN VISARGA */ + { 0x111B3, 0x111C0 }, /* SHARADA VOWEL SIGN AA - SHARADA SIGN VIRAMA */ + { 0x111C9, 0x111CC }, /* SHARADA SANDHI MARK - SHARADA EXTRA SHORT VOWEL MARK */ + { 0x111CE, 0x111CF }, /* SHARADA VOWEL SIGN PRISHTHAMATRA E - SHARADA SIGN INVERTED CANDRABINDU */ + { 0x1122C, 0x11237 }, /* KHOJKI VOWEL SIGN AA - KHOJKI SIGN SHADDA */ + { 0x1123E, 0x1123E }, /* KHOJKI SIGN SUKUN */ + { 0x11241, 0x11241 }, /* KHOJKI VOWEL SIGN VOCALIC R */ + { 0x112DF, 0x112EA }, /* KHUDAWADI SIGN ANUSVARA - KHUDAWADI SIGN VIRAMA */ + { 0x11300, 0x11303 }, /* GRANTHA SIGN COMBINING ANUSVARA ABOVE - GRANTHA SIGN VISARGA */ + { 0x1133B, 0x1133C }, /* COMBINING BINDU BELOW - GRANTHA SIGN NUKTA */ + { 0x1133E, 0x11344 }, /* GRANTHA VOWEL SIGN AA - GRANTHA VOWEL SIGN VOCALIC RR */ + { 0x11347, 0x11348 }, /* GRANTHA VOWEL SIGN EE - GRANTHA VOWEL SIGN AI */ + { 0x1134B, 0x1134D }, /* GRANTHA VOWEL SIGN OO - GRANTHA SIGN VIRAMA */ + { 0x11357, 0x11357 }, /* GRANTHA AU LENGTH MARK */ + { 0x11362, 0x11363 }, /* GRANTHA VOWEL SIGN VOCALIC L - GRANTHA VOWEL SIGN VOCALIC LL */ + { 0x11366, 0x1136C }, /* COMBINING GRANTHA DIGIT ZERO - COMBINING GRANTHA DIGIT SIX */ + { 0x11370, 0x11374 }, /* COMBINING GRANTHA LETTER A - COMBINING GRANTHA LETTER PA */ + { 0x113B8, 0x113C0 }, /* TULU-TIGALARI VOWEL SIGN AA - TULU-TIGALARI VOWEL SIGN VOCALIC LL */ + { 0x113C2, 0x113C2 }, /* TULU-TIGALARI VOWEL SIGN EE */ + { 0x113C5, 0x113C5 }, /* TULU-TIGALARI VOWEL SIGN AI */ + { 0x113C7, 0x113CA }, /* TULU-TIGALARI VOWEL SIGN OO - TULU-TIGALARI SIGN CANDRA ANUNASIKA */ + { 0x113CC, 0x113D0 }, /* TULU-TIGALARI SIGN ANUSVARA - TULU-TIGALARI CONJOINER */ + { 0x113D2, 0x113D2 }, /* TULU-TIGALARI GEMINATION MARK */ + { 0x113E1, 0x113E2 }, /* TULU-TIGALARI VEDIC TONE SVARITA - TULU-TIGALARI VEDIC TONE ANUDATTA */ + { 0x11435, 0x11446 }, /* NEWA VOWEL SIGN AA - NEWA SIGN NUKTA */ + { 0x1145E, 0x1145E }, /* NEWA SANDHI MARK */ + { 0x114B0, 0x114C3 }, /* TIRHUTA VOWEL SIGN AA - TIRHUTA SIGN NUKTA */ + { 0x115AF, 0x115B5 }, /* SIDDHAM VOWEL SIGN AA - SIDDHAM VOWEL SIGN VOCALIC RR */ + { 0x115B8, 0x115C0 }, /* SIDDHAM VOWEL SIGN E - SIDDHAM SIGN NUKTA */ + { 0x115DC, 0x115DD }, /* SIDDHAM VOWEL SIGN ALTERNATE U - SIDDHAM VOWEL SIGN ALTERNATE UU */ + { 0x11630, 0x11640 }, /* MODI VOWEL SIGN AA - MODI SIGN ARDHACANDRA */ + { 0x116AB, 0x116B7 }, /* TAKRI SIGN ANUSVARA - TAKRI SIGN NUKTA */ + { 0x1171D, 0x1172B }, /* AHOM CONSONANT SIGN MEDIAL LA - AHOM SIGN KILLER */ + { 0x1182C, 0x1183A }, /* DOGRA VOWEL SIGN AA - DOGRA SIGN NUKTA */ + { 0x11930, 0x11935 }, /* DIVES AKURU VOWEL SIGN AA - DIVES AKURU VOWEL SIGN E */ + { 0x11937, 0x11938 }, /* DIVES AKURU VOWEL SIGN AI - DIVES AKURU VOWEL SIGN O */ + { 0x1193B, 0x1193E }, /* DIVES AKURU SIGN ANUSVARA - DIVES AKURU VIRAMA */ + { 0x11940, 0x11940 }, /* DIVES AKURU MEDIAL YA */ + { 0x11942, 0x11943 }, /* DIVES AKURU MEDIAL RA - DIVES AKURU SIGN NUKTA */ + { 0x119D1, 0x119D7 }, /* NANDINAGARI VOWEL SIGN AA - NANDINAGARI VOWEL SIGN VOCALIC RR */ + { 0x119DA, 0x119E0 }, /* NANDINAGARI VOWEL SIGN E - NANDINAGARI SIGN VIRAMA */ + { 0x119E4, 0x119E4 }, /* NANDINAGARI VOWEL SIGN PRISHTHAMATRA E */ + { 0x11A01, 0x11A0A }, /* ZANABAZAR SQUARE VOWEL SIGN I - ZANABAZAR SQUARE VOWEL LENGTH MARK */ + { 0x11A33, 0x11A39 }, /* ZANABAZAR SQUARE FINAL CONSONANT MARK - ZANABAZAR SQUARE SIGN VISARGA */ + { 0x11A3B, 0x11A3E }, /* ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA - ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA */ + { 0x11A47, 0x11A47 }, /* ZANABAZAR SQUARE SUBJOINER */ + { 0x11A51, 0x11A5B }, /* SOYOMBO VOWEL SIGN I - SOYOMBO VOWEL LENGTH MARK */ + { 0x11A8A, 0x11A99 }, /* SOYOMBO FINAL CONSONANT SIGN G - SOYOMBO SUBJOINER */ + { 0x11C2F, 0x11C36 }, /* BHAIKSUKI VOWEL SIGN AA - BHAIKSUKI VOWEL SIGN VOCALIC L */ + { 0x11C38, 0x11C3F }, /* BHAIKSUKI VOWEL SIGN E - BHAIKSUKI SIGN VIRAMA */ + { 0x11C92, 0x11CA7 }, /* MARCHEN SUBJOINED LETTER KA - MARCHEN SUBJOINED LETTER ZA */ + { 0x11CA9, 0x11CB6 }, /* MARCHEN SUBJOINED LETTER YA - MARCHEN SIGN CANDRABINDU */ + { 0x11D31, 0x11D36 }, /* MASARAM GONDI VOWEL SIGN AA - MASARAM GONDI VOWEL SIGN VOCALIC R */ + { 0x11D3A, 0x11D3A }, /* MASARAM GONDI VOWEL SIGN E */ + { 0x11D3C, 0x11D3D }, /* MASARAM GONDI VOWEL SIGN AI - MASARAM GONDI VOWEL SIGN O */ + { 0x11D3F, 0x11D45 }, /* MASARAM GONDI VOWEL SIGN AU - MASARAM GONDI VIRAMA */ + { 0x11D47, 0x11D47 }, /* MASARAM GONDI RA-KARA */ + { 0x11D8A, 0x11D8E }, /* GUNJALA GONDI VOWEL SIGN AA - GUNJALA GONDI VOWEL SIGN UU */ + { 0x11D90, 0x11D91 }, /* GUNJALA GONDI VOWEL SIGN EE - GUNJALA GONDI VOWEL SIGN AI */ + { 0x11D93, 0x11D97 }, /* GUNJALA GONDI VOWEL SIGN OO - GUNJALA GONDI VIRAMA */ + { 0x11EF3, 0x11EF6 }, /* MAKASAR VOWEL SIGN I - MAKASAR VOWEL SIGN O */ + { 0x11F00, 0x11F01 }, /* KAWI SIGN CANDRABINDU - KAWI SIGN ANUSVARA */ + { 0x11F03, 0x11F03 }, /* KAWI SIGN VISARGA */ + { 0x11F34, 0x11F3A }, /* KAWI VOWEL SIGN AA - KAWI VOWEL SIGN VOCALIC R */ + { 0x11F3E, 0x11F42 }, /* KAWI VOWEL SIGN E - KAWI CONJOINER */ + { 0x11F5A, 0x11F5A }, /* KAWI SIGN NUKTA */ + { 0x13430, 0x13440 }, /* EGYPTIAN HIEROGLYPH VERTICAL JOINER - EGYPTIAN HIEROGLYPH MIRROR HORIZONTALLY */ + { 0x13447, 0x13455 }, /* EGYPTIAN HIEROGLYPH MODIFIER DAMAGED AT TOP START - EGYPTIAN HIEROGLYPH MODIFIER DAMAGED */ + { 0x1611E, 0x1612F }, /* GURUNG KHEMA VOWEL SIGN AA - GURUNG KHEMA SIGN THOLHOMA */ + { 0x16AF0, 0x16AF4 }, /* BASSA VAH COMBINING HIGH TONE - BASSA VAH COMBINING HIGH-LOW TONE */ + { 0x16B30, 0x16B36 }, /* PAHAWH HMONG MARK CIM TUB - PAHAWH HMONG MARK CIM TAUM */ + { 0x16F4F, 0x16F4F }, /* MIAO SIGN CONSONANT MODIFIER BAR */ + { 0x16F51, 0x16F87 }, /* MIAO SIGN ASPIRATION - MIAO VOWEL SIGN UI */ + { 0x16F8F, 0x16F92 }, /* MIAO TONE RIGHT - MIAO TONE BELOW */ + { 0x16FE4, 0x16FE4 }, /* KHITAN SMALL SCRIPT FILLER */ + { 0x16FF0, 0x16FF1 }, /* VIETNAMESE ALTERNATE READING MARK CA - VIETNAMESE ALTERNATE READING MARK NHAY */ + { 0x1BC9D, 0x1BC9E }, /* DUPLOYAN THICK LETTER SELECTOR - DUPLOYAN DOUBLE MARK */ + { 0x1BCA0, 0x1BCA3 }, /* SHORTHAND FORMAT LETTER OVERLAP - SHORTHAND FORMAT UP STEP */ + { 0x1CF00, 0x1CF2D }, /* ZNAMENNY COMBINING MARK GORAZDO NIZKO S KRYZHEM ON LEFT - ZNAMENNY COMBINING MARK KRYZH ON LEFT */ + { 0x1CF30, 0x1CF46 }, /* ZNAMENNY COMBINING TONAL RANGE MARK MRACHNO - ZNAMENNY PRIZNAK MODIFIER ROG */ + { 0x1D165, 0x1D169 }, /* MUSICAL SYMBOL COMBINING STEM - MUSICAL SYMBOL COMBINING TREMOLO-3 */ + { 0x1D16D, 0x1D182 }, /* MUSICAL SYMBOL COMBINING AUGMENTATION DOT - MUSICAL SYMBOL COMBINING LOURE */ + { 0x1D185, 0x1D18B }, /* MUSICAL SYMBOL COMBINING DOIT - MUSICAL SYMBOL COMBINING TRIPLE TONGUE */ + { 0x1D1AA, 0x1D1AD }, /* MUSICAL SYMBOL COMBINING DOWN BOW - MUSICAL SYMBOL COMBINING SNAP PIZZICATO */ + { 0x1D242, 0x1D244 }, /* COMBINING GREEK MUSICAL TRISEME - COMBINING GREEK MUSICAL PENTASEME */ + { 0x1DA00, 0x1DA36 }, /* SIGNWRITING HEAD RIM - SIGNWRITING AIR SUCKING IN */ + { 0x1DA3B, 0x1DA6C }, /* SIGNWRITING MOUTH CLOSED NEUTRAL - SIGNWRITING EXCITEMENT */ + { 0x1DA75, 0x1DA75 }, /* SIGNWRITING UPPER BODY TILTING FROM HIP JOINTS */ + { 0x1DA84, 0x1DA84 }, /* SIGNWRITING LOCATION HEAD NECK */ + { 0x1DA9B, 0x1DA9F }, /* SIGNWRITING FILL MODIFIER-2 - SIGNWRITING FILL MODIFIER-6 */ + { 0x1DAA1, 0x1DAAF }, /* SIGNWRITING ROTATION MODIFIER-2 - SIGNWRITING ROTATION MODIFIER-16 */ + { 0x1E000, 0x1E006 }, /* COMBINING GLAGOLITIC LETTER AZU - COMBINING GLAGOLITIC LETTER ZHIVETE */ + { 0x1E008, 0x1E018 }, /* COMBINING GLAGOLITIC LETTER ZEMLJA - COMBINING GLAGOLITIC LETTER HERU */ + { 0x1E01B, 0x1E021 }, /* COMBINING GLAGOLITIC LETTER SHTA - COMBINING GLAGOLITIC LETTER YATI */ + { 0x1E023, 0x1E024 }, /* COMBINING GLAGOLITIC LETTER YU - COMBINING GLAGOLITIC LETTER SMALL YUS */ + { 0x1E026, 0x1E02A }, /* COMBINING GLAGOLITIC LETTER YO - COMBINING GLAGOLITIC LETTER FITA */ + { 0x1E08F, 0x1E08F }, /* COMBINING CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I */ + { 0x1E130, 0x1E136 }, /* NYIAKENG PUACHUE HMONG TONE-B - NYIAKENG PUACHUE HMONG TONE-D */ + { 0x1E2AE, 0x1E2AE }, /* TOTO SIGN RISING TONE */ + { 0x1E2EC, 0x1E2EF }, /* WANCHO TONE TUP - WANCHO TONE KOINI */ + { 0x1E4EC, 0x1E4EF }, /* NAG MUNDARI SIGN MUHOR - NAG MUNDARI SIGN SUTUH */ + { 0x1E5EE, 0x1E5EF }, /* OL ONAL SIGN MU - OL ONAL SIGN IKIR */ + { 0x1E8D0, 0x1E8D6 }, /* MENDE KIKAKUI COMBINING NUMBER TEENS - MENDE KIKAKUI COMBINING NUMBER MILLIONS */ + { 0x1E944, 0x1E94A }, /* ADLAM ALIF LENGTHENER - ADLAM NUKTA */ + { 0x1F3FB, 0x1F3FF }, /* EMOJI MODIFIER FITZPATRICK TYPE-1-2 - EMOJI MODIFIER FITZPATRICK TYPE-6 */ + { 0x1F9B0, 0x1F9B3 }, /* EMOJI COMPONENT RED HAIR - EMOJI COMPONENT WHITE HAIR */ + { 0xE0001, 0xE0001 }, /* LANGUAGE TAG */ + { 0xE0020, 0xE007F }, /* TAG SPACE - CANCEL TAG */ + { 0xE0100, 0xE01EF }, /* VARIATION SELECTOR-17 - VARIATION SELECTOR-256 */ +}; + +/* Double-width character ranges */ +static const struct ucs_interval ucs_double_width_ranges[] = { + { 0x01100, 0x0115F }, /* HANGUL CHOSEONG KIYEOK - HANGUL CHOSEONG FILLER */ + { 0x0231A, 0x0231B }, /* WATCH - HOURGLASS */ + { 0x02329, 0x0232A }, /* LEFT-POINTING ANGLE BRACKET - RIGHT-POINTING ANGLE BRACKET */ + { 0x023E9, 0x023EC }, /* BLACK RIGHT-POINTING DOUBLE TRIANGLE - BLACK DOWN-POINTING DOUBLE TRIANGLE */ + { 0x023F0, 0x023F0 }, /* ALARM CLOCK */ + { 0x023F3, 0x023F3 }, /* HOURGLASS WITH FLOWING SAND */ + { 0x025FD, 0x025FE }, /* WHITE MEDIUM SMALL SQUARE - BLACK MEDIUM SMALL SQUARE */ + { 0x02614, 0x02615 }, /* UMBRELLA WITH RAIN DROPS - HOT BEVERAGE */ + { 0x02630, 0x02637 }, /* TRIGRAM FOR HEAVEN - TRIGRAM FOR EARTH */ + { 0x02648, 0x02653 }, /* ARIES - PISCES */ + { 0x0267F, 0x0267F }, /* WHEELCHAIR SYMBOL */ + { 0x0268A, 0x0268F }, /* MONOGRAM FOR YANG - DIGRAM FOR GREATER YIN */ + { 0x02693, 0x02693 }, /* ANCHOR */ + { 0x026A1, 0x026A1 }, /* HIGH VOLTAGE SIGN */ + { 0x026AA, 0x026AB }, /* MEDIUM WHITE CIRCLE - MEDIUM BLACK CIRCLE */ + { 0x026BD, 0x026BE }, /* SOCCER BALL - BASEBALL */ + { 0x026C4, 0x026C5 }, /* SNOWMAN WITHOUT SNOW - SUN BEHIND CLOUD */ + { 0x026CE, 0x026CE }, /* OPHIUCHUS */ + { 0x026D4, 0x026D4 }, /* NO ENTRY */ + { 0x026EA, 0x026EA }, /* CHURCH */ + { 0x026F2, 0x026F3 }, /* FOUNTAIN - FLAG IN HOLE */ + { 0x026F5, 0x026F5 }, /* SAILBOAT */ + { 0x026FA, 0x026FA }, /* TENT */ + { 0x026FD, 0x026FD }, /* FUEL PUMP */ + { 0x02705, 0x02705 }, /* WHITE HEAVY CHECK MARK */ + { 0x0270A, 0x0270B }, /* RAISED FIST - RAISED HAND */ + { 0x02728, 0x02728 }, /* SPARKLES */ + { 0x0274C, 0x0274C }, /* CROSS MARK */ + { 0x0274E, 0x0274E }, /* NEGATIVE SQUARED CROSS MARK */ + { 0x02753, 0x02755 }, /* BLACK QUESTION MARK ORNAMENT - WHITE EXCLAMATION MARK ORNAMENT */ + { 0x02757, 0x02757 }, /* HEAVY EXCLAMATION MARK SYMBOL */ + { 0x02795, 0x02797 }, /* HEAVY PLUS SIGN - HEAVY DIVISION SIGN */ + { 0x027B0, 0x027B0 }, /* CURLY LOOP */ + { 0x027BF, 0x027BF }, /* DOUBLE CURLY LOOP */ + { 0x02B1B, 0x02B1C }, /* BLACK LARGE SQUARE - WHITE LARGE SQUARE */ + { 0x02B50, 0x02B50 }, /* WHITE MEDIUM STAR */ + { 0x02B55, 0x02B55 }, /* HEAVY LARGE CIRCLE */ + { 0x02E80, 0x02E99 }, /* CJK RADICAL REPEAT - CJK RADICAL RAP */ + { 0x02E9B, 0x02EF3 }, /* CJK RADICAL CHOKE - CJK RADICAL C-SIMPLIFIED TURTLE */ + { 0x02F00, 0x02FD5 }, /* KANGXI RADICAL ONE - KANGXI RADICAL FLUTE */ + { 0x02FF0, 0x03029 }, /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT - HANGZHOU NUMERAL NINE */ + { 0x03030, 0x0303E }, /* WAVY DASH - IDEOGRAPHIC VARIATION INDICATOR */ + { 0x03041, 0x03096 }, /* HIRAGANA LETTER SMALL A - HIRAGANA LETTER SMALL KE */ + { 0x0309B, 0x030FF }, /* KATAKANA-HIRAGANA VOICED SOUND MARK - KATAKANA DIGRAPH KOTO */ + { 0x03105, 0x0312F }, /* BOPOMOFO LETTER B - BOPOMOFO LETTER NN */ + { 0x03131, 0x0318E }, /* HANGUL LETTER KIYEOK - HANGUL LETTER ARAEAE */ + { 0x03190, 0x031E5 }, /* IDEOGRAPHIC ANNOTATION LINKING MARK - CJK STROKE SZP */ + { 0x031EF, 0x0321E }, /* IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION - PARENTHESIZED KOREAN CHARACTER O HU */ + { 0x03220, 0x03247 }, /* PARENTHESIZED IDEOGRAPH ONE - CIRCLED IDEOGRAPH KOTO */ + { 0x03250, 0x0A48C }, /* PARTNERSHIP SIGN - YI SYLLABLE YYR */ + { 0x0A490, 0x0A4C6 }, /* YI RADICAL QOT - YI RADICAL KE */ + { 0x0A960, 0x0A97C }, /* HANGUL CHOSEONG TIKEUT-MIEUM - HANGUL CHOSEONG SSANGYEORINHIEUH */ + { 0x0AC00, 0x0D7A3 }, /* HANGUL SYLLABLE GA - HANGUL SYLLABLE HIH */ + { 0x0F900, 0x0FAFF }, /* U+F900 - U+FAFF */ + { 0x0FE10, 0x0FE19 }, /* PRESENTATION FORM FOR VERTICAL COMMA - PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS */ + { 0x0FE30, 0x0FE52 }, /* PRESENTATION FORM FOR VERTICAL TWO DOT LEADER - SMALL FULL STOP */ + { 0x0FE54, 0x0FE66 }, /* SMALL SEMICOLON - SMALL EQUALS SIGN */ + { 0x0FE68, 0x0FE6B }, /* SMALL REVERSE SOLIDUS - SMALL COMMERCIAL AT */ + { 0x0FF01, 0x0FF60 }, /* FULLWIDTH EXCLAMATION MARK - FULLWIDTH RIGHT WHITE PARENTHESIS */ + { 0x0FFE0, 0x0FFE6 }, /* FULLWIDTH CENT SIGN - FULLWIDTH WON SIGN */ + { 0x16FE0, 0x16FE3 }, /* TANGUT ITERATION MARK - OLD CHINESE ITERATION MARK */ + { 0x17000, 0x187F7 }, /* U+17000 - U+187F7 */ + { 0x18800, 0x18CD5 }, /* TANGUT COMPONENT-001 - KHITAN SMALL SCRIPT CHARACTER-18CD5 */ + { 0x18CFF, 0x18D08 }, /* U+18CFF - U+18D08 */ + { 0x1AFF0, 0x1AFF3 }, /* KATAKANA LETTER MINNAN TONE-2 - KATAKANA LETTER MINNAN TONE-5 */ + { 0x1AFF5, 0x1AFFB }, /* KATAKANA LETTER MINNAN TONE-7 - KATAKANA LETTER MINNAN NASALIZED TONE-5 */ + { 0x1AFFD, 0x1AFFE }, /* KATAKANA LETTER MINNAN NASALIZED TONE-7 - KATAKANA LETTER MINNAN NASALIZED TONE-8 */ + { 0x1B000, 0x1B122 }, /* KATAKANA LETTER ARCHAIC E - KATAKANA LETTER ARCHAIC WU */ + { 0x1B132, 0x1B132 }, /* HIRAGANA LETTER SMALL KO */ + { 0x1B150, 0x1B152 }, /* HIRAGANA LETTER SMALL WI - HIRAGANA LETTER SMALL WO */ + { 0x1B155, 0x1B155 }, /* KATAKANA LETTER SMALL KO */ + { 0x1B164, 0x1B167 }, /* KATAKANA LETTER SMALL WI - KATAKANA LETTER SMALL N */ + { 0x1B170, 0x1B2FB }, /* NUSHU CHARACTER-1B170 - NUSHU CHARACTER-1B2FB */ + { 0x1D300, 0x1D356 }, /* MONOGRAM FOR EARTH - TETRAGRAM FOR FOSTERING */ + { 0x1D360, 0x1D376 }, /* COUNTING ROD UNIT DIGIT ONE - IDEOGRAPHIC TALLY MARK FIVE */ + { 0x1F000, 0x1F02F }, /* U+1F000 - U+1F02F */ + { 0x1F0A0, 0x1F0FF }, /* U+1F0A0 - U+1F0FF */ + { 0x1F18E, 0x1F18E }, /* NEGATIVE SQUARED AB */ + { 0x1F191, 0x1F19A }, /* SQUARED CL - SQUARED VS */ + { 0x1F200, 0x1F202 }, /* SQUARE HIRAGANA HOKA - SQUARED KATAKANA SA */ + { 0x1F210, 0x1F23B }, /* SQUARED CJK UNIFIED IDEOGRAPH-624B - SQUARED CJK UNIFIED IDEOGRAPH-914D */ + { 0x1F240, 0x1F248 }, /* TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C - TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557 */ + { 0x1F250, 0x1F251 }, /* CIRCLED IDEOGRAPH ADVANTAGE - CIRCLED IDEOGRAPH ACCEPT */ + { 0x1F260, 0x1F265 }, /* ROUNDED SYMBOL FOR FU - ROUNDED SYMBOL FOR CAI */ + { 0x1F300, 0x1F3FA }, /* CYCLONE - AMPHORA */ + { 0x1F400, 0x1F64F }, /* RAT - PERSON WITH FOLDED HANDS */ + { 0x1F680, 0x1F9AF }, /* ROCKET - PROBING CANE */ + { 0x1F9B4, 0x1FAFF }, /* U+1F9B4 - U+1FAFF */ + { 0x20000, 0x2FFFD }, /* U+20000 - U+2FFFD */ + { 0x30000, 0x3FFFD }, /* U+30000 - U+3FFFD */ +}; From patchwork Thu Apr 17 18:45:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882556 Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D69DB256C6B; Thu, 17 Apr 2025 18:49:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.155 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915751; cv=none; b=jPWnUUW5lEF5c9+CyRypJIqEKINTOgpt3aasfaDtoEP1LPuEENN/pVeqhCakKQLdQhBtLs+sJcPugLxU8brH7Xzvh0D8D7cCXhW7rGgcTTEyor3N0SzwoPMnSqrnAfjuEWx+T+ePWjnnp6AAvhHrNQ/O8TGRnbthuEnbwBrzjlU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915751; c=relaxed/simple; bh=uxGsu/wxj7Y9HMsOw9sXMws2D/dMPtGMl9cKeewvKXA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Qm+R1wtPDPSI2fySHXNqyUaX8hHtq1Eav6idaCXn5c0qZKuBzLclV0ZUlZDnWxPJWGhtAoWC7ZmuYeuFLeCuFcfrG2GGxbmfIqn3q412x1Djfv+g8fU1FO7SD7JyHg+vLaUE9HLHCJBkeEi/IBCeuHnoKi7FjCObuEUNYDyfzcM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=fIUQBKOQ; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=k+iuIjo6; arc=none smtp.client-ip=202.12.124.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="fIUQBKOQ"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="k+iuIjo6" Received: from phl-compute-10.internal (phl-compute-10.phl.internal [10.202.2.50]) by mailfhigh.stl.internal (Postfix) with ESMTP id 13ADA25400A1; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-10.internal (MEProxy); Thu, 17 Apr 2025 14:49:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915748; x= 1745002148; bh=x+K1zhmTOuyWbLByYmIv9Lh65bEKEmySLPXhoU2GsVc=; b=f IUQBKOQTaMz8jIOx9LNA2Wa1H52JrznO28mn6vGgyVpm35jjFPsWAXpRtU4VIVXf Jf3N/Z++Gzk3RggyExZNyNUXoHp3SUuayYIufnW0jrNQT28LGthAns5ylMV4IrBq TQPvsyF3bgWuGIIIszoy7PeNKW3pBglQt2IDCcN3nRtyk/fmVmrDIT19YVVESXTF kdKy+CaFUx208DcGx7IoUa8fkZosP3dO4uL7TcrkgCLAZ7ZVzomfhR8rc1vJU4GJ tiEci78FlHJqf45elqOkjL1r6ix/lt9u2Y6RjRSIZCwm5B5tT8kGS8JP5XL9uJga nMGPPVpbtZefpWY9h2UjA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915748; x=1745002148; bh=x +K1zhmTOuyWbLByYmIv9Lh65bEKEmySLPXhoU2GsVc=; b=k+iuIjo6DRJBuOW+h N5L9W3w2AeVz5C4lZQI44TrLAb/EFf0sGjrchOVFIgxNVYIe1/JNWXji6J58aiVh ySEILMgoVEnXE4iWhcqqGQyGBz7KvucNbmTu8AoFPNMu3A2nwEBj9/Z3/EPe1ZKu 4StNTjoZdQAgc1Kr+AOwLNEhTeAyf+1M5NRMXwEVmQtuh9FsOla1AuYceEQNoOT1 vwi5RiCVl+EGLJGcYR1F/EMGk4jZgZYsfRG2IysOkKAqlL8bHyaZD+SMTXdbPQl2 EvSDw/OBmLvZncRfZV0aWq1J2ukAea5R/rNlVRurEkUfhaVNJddYHCzp0uxIvKxV p7gLg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnhepleektdffjeevfefhiedtudevudetjeejvdej uedtvdevveefuedujeejffetieeknecuffhomhgrihhnpegtrghmrdgrtgdruhhknecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepnhhitghosehf lhhugihnihgtrdhnvghtpdhnsggprhgtphhtthhopeehpdhmohguvgepshhmthhpohhuth dprhgtphhtthhopehnphhithhrvgessggrhihlihgsrhgvrdgtohhmpdhrtghpthhtohep jhhirhhishhlrggshieskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepghhrvghgkhhhse hlihhnuhigfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoheplhhinhhugidqkhgv rhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinhhugidqsh gvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 8004211443D0; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 06/14] vt: use new tables in ucs.c Date: Thu, 17 Apr 2025 14:45:08 -0400 Message-ID: <20250417184849.475581-7-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre This removes the table from ucs.c and substitutes the generated tables from ucs_width_table.h providing comprehensive ranges for double-width and zero-width Unicode code points. Also implements ucs_is_zero_width() to query the new zero-width table. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/ucs.c | 44 +++++++++++++++++++++----------------- include/linux/consolemap.h | 6 +----- 2 files changed, 25 insertions(+), 25 deletions(-) diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c index dc4a6e7945..5f9f25bd20 100644 --- a/drivers/tty/vt/ucs.c +++ b/drivers/tty/vt/ucs.c @@ -8,22 +8,12 @@ #include #include -/* ucs_is_double_width() is based on the wcwidth() implementation by - * Markus Kuhn -- 2007-05-26 (Unicode 5.0) - * Latest version: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c - */ - struct ucs_interval { u32 first; u32 last; }; -static const struct ucs_interval ucs_double_width_ranges[] = { - { 0x1100, 0x115F }, { 0x2329, 0x232A }, { 0x2E80, 0x303E }, - { 0x3040, 0xA4CF }, { 0xAC00, 0xD7A3 }, { 0xF900, 0xFAFF }, - { 0xFE10, 0xFE19 }, { 0xFE30, 0xFE6F }, { 0xFF00, 0xFF60 }, - { 0xFFE0, 0xFFE6 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } -}; +#include "ucs_width_table.h" static int interval_cmp(const void *key, const void *element) { @@ -37,6 +27,27 @@ static int interval_cmp(const void *key, const void *element) return 0; } +static bool cp_in_range(u32 cp, const struct ucs_interval *ranges, size_t size) +{ + if (!in_range(cp, ranges[0].first, ranges[size - 1].last)) + return false; + + return __inline_bsearch(&cp, ranges, size, sizeof(*ranges), + interval_cmp) != NULL; +} + +/** + * ucs_is_zero_width() - Determine if a Unicode code point is zero-width. + * @cp: Unicode code point (UCS-4) + * + * Return: true if the character is zero-width, false otherwise + */ +bool ucs_is_zero_width(u32 cp) +{ + return cp_in_range(cp, ucs_zero_width_ranges, + ARRAY_SIZE(ucs_zero_width_ranges)); +} + /** * ucs_is_double_width() - Determine if a Unicode code point is double-width. * @cp: Unicode code point (UCS-4) @@ -45,13 +56,6 @@ static int interval_cmp(const void *key, const void *element) */ bool ucs_is_double_width(u32 cp) { - size_t size = ARRAY_SIZE(ucs_double_width_ranges); - - if (!in_range(cp, ucs_double_width_ranges[0].first, - ucs_double_width_ranges[size - 1].last)) - return false; - - return __inline_bsearch(&cp, ucs_double_width_ranges, size, - sizeof(*ucs_double_width_ranges), - interval_cmp) != NULL; + return cp_in_range(cp, ucs_double_width_ranges, + ARRAY_SIZE(ucs_double_width_ranges)); } diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index 7d778752dc..b3a9118666 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -29,11 +29,7 @@ u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); bool ucs_is_double_width(uint32_t cp); -static inline bool ucs_is_zero_width(uint32_t cp) -{ - /* coming soon */ - return false; -} +bool ucs_is_zero_width(uint32_t cp); #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) From patchwork Thu Apr 17 18:45:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882554 Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16157256C7A; Thu, 17 Apr 2025 18:49:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.155 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915753; cv=none; b=tkkasimBcMqF7xLGtskWKkFHQMVhjsTx6w2HDvBKfkhMSSieRx6/SZ+UDed46tqumyZ0smhLQwksCBPYqxtUm5JlS22OExRM7EIJVAngx/DtjwY1poD89j9GTv8XGgiOVly3xVk72aJOas+OmS/d4kbfQBH6Bx2QsgYfxiRjvEI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915753; c=relaxed/simple; bh=nzSKtIa4u1TGC2uQylStJvP0dVcTdmzxtrj8cnztm8U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pCE/TZNahX7pc4GMnv03zqaXVpM0UU9lcxDX3NRzpJPfYSLw3HuSy56YOfZl+Yxkk4lVvlvkfLBOPtELU1I8ZR7+Y+a+9EXejO1oum7RJdHhPoUtoKOpkXY2RADyMCZWfA8PupdAQBR821jKOlSFRyTFCj6fFAJQDBwXvVXSdVo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=MUpRFsiE; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=baK+xv9H; arc=none smtp.client-ip=202.12.124.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="MUpRFsiE"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="baK+xv9H" Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfhigh.stl.internal (Postfix) with ESMTP id 197E4254014C; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-01.internal (MEProxy); Thu, 17 Apr 2025 14:49:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1744915748; x=1745002148; bh=Zam8T/hfZwh7IieY2W5OUB1kN/1OG90Gz8Agdh5Pe8Q=; b= MUpRFsiEHOqhrxX//rrtiSCNsKhOreCcIW5SPI83nQyb0rxEMUJO9aWkgbfrsy3U 7tXMg/P/Okaz0BBYDWVeNkWhF9wpGxaQzcOdKLQlQm6c3/MZDiU9bsW0Pu96jB0V kuwZiJbcOX0H5GuS3vIi4zSfhtDZYM/a7ZCJlb5mLwvCJrkMyqsQ+sZRn+Al406W NVaQI6AJrQ794EswiSxqdUG4TcCnEhMPWBnL6m7HL2c6/8OHG47qDyscF8X/5mjk Bn2quEqxMZ6NrA+R08LfrIM50X8aF5aIR5TW0g1QLehDkEa6H2BkCHLu+yzru01F FPceiXMWuot7vGgU4pe1fQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1744915748; x= 1745002148; bh=Zam8T/hfZwh7IieY2W5OUB1kN/1OG90Gz8Agdh5Pe8Q=; b=b aK+xv9H0IxJfADAbqSXO8PY4n1fOKDhriRg0Puq0ohdDW7T3Hz83dFzuCmQiJ3zQ V23mu0cYavpbkHT+Cz4YDjekZ+auh4rzAcNKx0ihpnDxXy6EFOB/uQKRWehi5lS/ 3+wXPFmeU1IdUOPotHcorfEBsEZLhrDID2MBWLXywJGAbb5HhDic9uRX1NlFwSdR Ihq+XsQDpD1/BDJeU3r338RX+CAtYFX3RRj+aZGyNQ3S7VS3XqpqT0GPF4YpModx K/b4THNYuN7MaHhO5CXUPGG9p0v9k6/PkRh7kxK+mDui7mxbqNA5WxYMyURH5gAY pfaFubz3KIFlADVi2t1Ow== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhggtgfgsehtkeertder tdejnecuhfhrohhmpefpihgtohhlrghsucfrihhtrhgvuceonhhitghosehflhhugihnih gtrdhnvghtqeenucggtffrrghtthgvrhhnpedufedvleehffethffhffelkeekieekgeev ffelleejteehieeghfeluedvvdehieenucevlhhushhtvghrufhiiigvpedtnecurfgrrh grmhepmhgrihhlfhhrohhmpehnihgtohesfhhluhignhhitgdrnhgvthdpnhgspghrtghp thhtohephedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepnhhpihhtrhgvsegsrg ihlhhisghrvgdrtghomhdprhgtphhtthhopehjihhrihhslhgrsgihsehkvghrnhgvlhdr ohhrghdprhgtphhtthhopehgrhgvghhkhheslhhinhhugihfohhunhgurghtihhonhdroh hrghdprhgtphhtthhopehlihhnuhigqdhkvghrnhgvlhesvhhgvghrrdhkvghrnhgvlhdr ohhrghdprhgtphhtthhopehlihhnuhigqdhsvghrihgrlhesvhhgvghrrdhkvghrnhgvlh drohhrgh X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 9C60A11443D1; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 07/14] vt: introduce gen_ucs_recompose_table.py to create ucs_recompose_table.h Date: Thu, 17 Apr 2025 14:45:09 -0400 Message-ID: <20250417184849.475581-8-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre The generated table maps base character + combining mark pairs to their precomposed equivalents using Python's unicodedata module. The default script behavior is to create a table with most commonly used Latin, Greek, and Cyrillic recomposition pairs only. It is much smaller than the table with all possible recomposition pairs (71 entries vs 1000 entries). But if one needs/wants the full table then simply running the script with the --full argument will generate it. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/gen_ucs_recompose_table.py | 255 ++++++++++++++++++++++ 1 file changed, 255 insertions(+) create mode 100755 drivers/tty/vt/gen_ucs_recompose_table.py diff --git a/drivers/tty/vt/gen_ucs_recompose_table.py b/drivers/tty/vt/gen_ucs_recompose_table.py new file mode 100755 index 0000000000..d30f8f5242 --- /dev/null +++ b/drivers/tty/vt/gen_ucs_recompose_table.py @@ -0,0 +1,255 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# Leverage Python's unicodedata module to generate ucs_recompose_table.h +# +# The generated table maps base character + combining mark pairs to their +# precomposed equivalents. +# +# Usage: +# python3 gen_ucs_recompose_table.py # Generate with common recomposition pairs +# python3 gen_ucs_recompose_table.py --full # Generate with all recomposition pairs + +import unicodedata +import sys +import argparse +import textwrap + +# This script's file name +from pathlib import Path +this_file = Path(__file__).name + +# Output file name +out_file = "ucs_recompose_table.h" + +common_recompose_description = "most commonly used Latin, Greek, and Cyrillic recomposition pairs only" +COMMON_RECOMPOSITION_PAIRS = [ + # Latin letters with accents - uppercase + (0x0041, 0x0300, 0x00C0), # A + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER A WITH GRAVE + (0x0041, 0x0301, 0x00C1), # A + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER A WITH ACUTE + (0x0041, 0x0302, 0x00C2), # A + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER A WITH CIRCUMFLEX + (0x0041, 0x0303, 0x00C3), # A + COMBINING TILDE = LATIN CAPITAL LETTER A WITH TILDE + (0x0041, 0x0308, 0x00C4), # A + COMBINING DIAERESIS = LATIN CAPITAL LETTER A WITH DIAERESIS + (0x0041, 0x030A, 0x00C5), # A + COMBINING RING ABOVE = LATIN CAPITAL LETTER A WITH RING ABOVE + (0x0043, 0x0327, 0x00C7), # C + COMBINING CEDILLA = LATIN CAPITAL LETTER C WITH CEDILLA + (0x0045, 0x0300, 0x00C8), # E + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER E WITH GRAVE + (0x0045, 0x0301, 0x00C9), # E + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER E WITH ACUTE + (0x0045, 0x0302, 0x00CA), # E + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER E WITH CIRCUMFLEX + (0x0045, 0x0308, 0x00CB), # E + COMBINING DIAERESIS = LATIN CAPITAL LETTER E WITH DIAERESIS + (0x0049, 0x0300, 0x00CC), # I + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER I WITH GRAVE + (0x0049, 0x0301, 0x00CD), # I + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER I WITH ACUTE + (0x0049, 0x0302, 0x00CE), # I + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER I WITH CIRCUMFLEX + (0x0049, 0x0308, 0x00CF), # I + COMBINING DIAERESIS = LATIN CAPITAL LETTER I WITH DIAERESIS + (0x004E, 0x0303, 0x00D1), # N + COMBINING TILDE = LATIN CAPITAL LETTER N WITH TILDE + (0x004F, 0x0300, 0x00D2), # O + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER O WITH GRAVE + (0x004F, 0x0301, 0x00D3), # O + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER O WITH ACUTE + (0x004F, 0x0302, 0x00D4), # O + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER O WITH CIRCUMFLEX + (0x004F, 0x0303, 0x00D5), # O + COMBINING TILDE = LATIN CAPITAL LETTER O WITH TILDE + (0x004F, 0x0308, 0x00D6), # O + COMBINING DIAERESIS = LATIN CAPITAL LETTER O WITH DIAERESIS + (0x0055, 0x0300, 0x00D9), # U + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER U WITH GRAVE + (0x0055, 0x0301, 0x00DA), # U + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER U WITH ACUTE + (0x0055, 0x0302, 0x00DB), # U + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER U WITH CIRCUMFLEX + (0x0055, 0x0308, 0x00DC), # U + COMBINING DIAERESIS = LATIN CAPITAL LETTER U WITH DIAERESIS + (0x0059, 0x0301, 0x00DD), # Y + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER Y WITH ACUTE + + # Latin letters with accents - lowercase + (0x0061, 0x0300, 0x00E0), # a + COMBINING GRAVE ACCENT = LATIN SMALL LETTER A WITH GRAVE + (0x0061, 0x0301, 0x00E1), # a + COMBINING ACUTE ACCENT = LATIN SMALL LETTER A WITH ACUTE + (0x0061, 0x0302, 0x00E2), # a + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER A WITH CIRCUMFLEX + (0x0061, 0x0303, 0x00E3), # a + COMBINING TILDE = LATIN SMALL LETTER A WITH TILDE + (0x0061, 0x0308, 0x00E4), # a + COMBINING DIAERESIS = LATIN SMALL LETTER A WITH DIAERESIS + (0x0061, 0x030A, 0x00E5), # a + COMBINING RING ABOVE = LATIN SMALL LETTER A WITH RING ABOVE + (0x0063, 0x0327, 0x00E7), # c + COMBINING CEDILLA = LATIN SMALL LETTER C WITH CEDILLA + (0x0065, 0x0300, 0x00E8), # e + COMBINING GRAVE ACCENT = LATIN SMALL LETTER E WITH GRAVE + (0x0065, 0x0301, 0x00E9), # e + COMBINING ACUTE ACCENT = LATIN SMALL LETTER E WITH ACUTE + (0x0065, 0x0302, 0x00EA), # e + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER E WITH CIRCUMFLEX + (0x0065, 0x0308, 0x00EB), # e + COMBINING DIAERESIS = LATIN SMALL LETTER E WITH DIAERESIS + (0x0069, 0x0300, 0x00EC), # i + COMBINING GRAVE ACCENT = LATIN SMALL LETTER I WITH GRAVE + (0x0069, 0x0301, 0x00ED), # i + COMBINING ACUTE ACCENT = LATIN SMALL LETTER I WITH ACUTE + (0x0069, 0x0302, 0x00EE), # i + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER I WITH CIRCUMFLEX + (0x0069, 0x0308, 0x00EF), # i + COMBINING DIAERESIS = LATIN SMALL LETTER I WITH DIAERESIS + (0x006E, 0x0303, 0x00F1), # n + COMBINING TILDE = LATIN SMALL LETTER N WITH TILDE + (0x006F, 0x0300, 0x00F2), # o + COMBINING GRAVE ACCENT = LATIN SMALL LETTER O WITH GRAVE + (0x006F, 0x0301, 0x00F3), # o + COMBINING ACUTE ACCENT = LATIN SMALL LETTER O WITH ACUTE + (0x006F, 0x0302, 0x00F4), # o + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER O WITH CIRCUMFLEX + (0x006F, 0x0303, 0x00F5), # o + COMBINING TILDE = LATIN SMALL LETTER O WITH TILDE + (0x006F, 0x0308, 0x00F6), # o + COMBINING DIAERESIS = LATIN SMALL LETTER O WITH DIAERESIS + (0x0075, 0x0300, 0x00F9), # u + COMBINING GRAVE ACCENT = LATIN SMALL LETTER U WITH GRAVE + (0x0075, 0x0301, 0x00FA), # u + COMBINING ACUTE ACCENT = LATIN SMALL LETTER U WITH ACUTE + (0x0075, 0x0302, 0x00FB), # u + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER U WITH CIRCUMFLEX + (0x0075, 0x0308, 0x00FC), # u + COMBINING DIAERESIS = LATIN SMALL LETTER U WITH DIAERESIS + (0x0079, 0x0301, 0x00FD), # y + COMBINING ACUTE ACCENT = LATIN SMALL LETTER Y WITH ACUTE + (0x0079, 0x0308, 0x00FF), # y + COMBINING DIAERESIS = LATIN SMALL LETTER Y WITH DIAERESIS + + # Common Greek characters + (0x0391, 0x0301, 0x0386), # Α + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ALPHA WITH TONOS + (0x0395, 0x0301, 0x0388), # Ε + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER EPSILON WITH TONOS + (0x0397, 0x0301, 0x0389), # Η + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ETA WITH TONOS + (0x0399, 0x0301, 0x038A), # Ι + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER IOTA WITH TONOS + (0x039F, 0x0301, 0x038C), # Ο + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMICRON WITH TONOS + (0x03A5, 0x0301, 0x038E), # Υ + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER UPSILON WITH TONOS + (0x03A9, 0x0301, 0x038F), # Ω + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMEGA WITH TONOS + (0x03B1, 0x0301, 0x03AC), # α + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ALPHA WITH TONOS + (0x03B5, 0x0301, 0x03AD), # ε + COMBINING ACUTE ACCENT = GREEK SMALL LETTER EPSILON WITH TONOS + (0x03B7, 0x0301, 0x03AE), # η + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ETA WITH TONOS + (0x03B9, 0x0301, 0x03AF), # ι + COMBINING ACUTE ACCENT = GREEK SMALL LETTER IOTA WITH TONOS + (0x03BF, 0x0301, 0x03CC), # ο + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMICRON WITH TONOS + (0x03C5, 0x0301, 0x03CD), # υ + COMBINING ACUTE ACCENT = GREEK SMALL LETTER UPSILON WITH TONOS + (0x03C9, 0x0301, 0x03CE), # ω + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMEGA WITH TONOS + + # Common Cyrillic characters + (0x0418, 0x0306, 0x0419), # И + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT I + (0x0438, 0x0306, 0x0439), # и + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT I + (0x0423, 0x0306, 0x040E), # У + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT U + (0x0443, 0x0306, 0x045E), # у + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT U +] + +full_recompose_description = "all possible recomposition pairs from the Unicode BMP" +def collect_all_recomposition_pairs(): + """Collect all possible recomposition pairs from the Unicode data.""" + # Map to store recomposition pairs: (base, combining) -> recomposed + recompose_map = {} + + # Process all assigned Unicode code points in BMP (Basic Multilingual Plane) + # We limit to BMP (0x0000-0xFFFF) to keep our table smaller with uint16_t + for cp in range(0, 0x10000): + try: + char = chr(cp) + + # Skip unassigned or control characters + if not unicodedata.name(char, ''): + continue + + # Find decomposition + decomp = unicodedata.decomposition(char) + if not decomp or '<' in decomp: # Skip compatibility decompositions + continue + + # Parse the decomposition + parts = decomp.split() + if len(parts) == 2: # Simple base + combining mark + base = int(parts[0], 16) + combining = int(parts[1], 16) + + # Only store if both are in BMP + if base < 0x10000 and combining < 0x10000: + recompose_map[(base, combining)] = cp + + except (ValueError, TypeError): + continue + + # Convert to a list of tuples and sort for binary search + recompose_list = [(base, combining, recomposed) + for (base, combining), recomposed in recompose_map.items()] + recompose_list.sort() + + return recompose_list + +def validate_common_pairs(full_list): + """Validate that all common pairs are in the full list. + + Raises: + ValueError: If any common pair is missing or has a different recomposition + value than what's in the full table. + """ + full_pairs = {(base, combining): recomposed for base, combining, recomposed in full_list} + for base, combining, recomposed in COMMON_RECOMPOSITION_PAIRS: + full_recomposed = full_pairs.get((base, combining)) + if full_recomposed is None: + error_msg = f"Error: Common pair (0x{base:04X}, 0x{combining:04X}) not found in full data" + print(error_msg) + raise ValueError(error_msg) + elif full_recomposed != recomposed: + error_msg = (f"Error: Common pair (0x{base:04X}, 0x{combining:04X}) has different recomposition: " + f"0x{recomposed:04X} vs 0x{full_recomposed:04X}") + print(error_msg) + raise ValueError(error_msg) + +def generate_recomposition_table(use_full_list=False): + """Generate the recomposition C table.""" + + # Collect all recomposition pairs for validation + full_recompose_list = collect_all_recomposition_pairs() + + # Decide which list to use + if use_full_list: + print("Using full recomposition list...") + recompose_list = full_recompose_list + table_description = full_recompose_description + alt_list = COMMON_RECOMPOSITION_PAIRS + alt_description = common_recompose_description + else: + print("Using common recomposition list...") + # Validate that all common pairs are in the full list + validate_common_pairs(full_recompose_list) + recompose_list = sorted(COMMON_RECOMPOSITION_PAIRS) + table_description = common_recompose_description + alt_list = full_recompose_list + alt_description = full_recompose_description + generation_mode = " --full" if use_full_list else "" + alternative_mode = " --full" if not use_full_list else "" + table_description_detail = f"{table_description} ({len(recompose_list)} entries)" + alt_description_detail = f"{alt_description} ({len(alt_list)} entries)" + + # Calculate min/max values for boundary checks + min_base = min(base for base, _, _ in recompose_list) + max_base = max(base for base, _, _ in recompose_list) + min_combining = min(combining for _, combining, _ in recompose_list) + max_combining = max(combining for _, combining, _ in recompose_list) + + # Generate implementation file + with open(out_file, 'w') as f: + f.write(f"""\ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * {out_file} - Unicode character recomposition + * + * Auto-generated by {this_file}{generation_mode} + * + * Unicode Version: {unicodedata.unidata_version} + * +{textwrap.fill( + f"This file contains a table with {table_description_detail}. " + + f"To generate a table with {alt_description_detail} instead, run:", + width=75, initial_indent=" * ", subsequent_indent=" * ")} + * + * python3 {this_file}{alternative_mode} + */ + +/* + * Table of {table_description} + * Sorted by base character and then combining mark for binary search + */ +static const struct ucs_recomposition ucs_recomposition_table[] = {{ +""") + + for base, combining, recomposed in recompose_list: + try: + base_name = unicodedata.name(chr(base)) + combining_name = unicodedata.name(chr(combining)) + recomposed_name = unicodedata.name(chr(recomposed)) + comment = f"/* {base_name} + {combining_name} = {recomposed_name} */" + except ValueError: + comment = f"/* U+{base:04X} + U+{combining:04X} = U+{recomposed:04X} */" + f.write(f"\t{{ 0x{base:04X}, 0x{combining:04X}, 0x{recomposed:04X} }}, {comment}\n") + + f.write(f"""\ +}}; + +/* + * Boundary values for quick rejection + * These are calculated by analyzing the table during generation + */ +#define UCS_RECOMPOSE_MIN_BASE 0x{min_base:04X} +#define UCS_RECOMPOSE_MAX_BASE 0x{max_base:04X} +#define UCS_RECOMPOSE_MIN_MARK 0x{min_combining:04X} +#define UCS_RECOMPOSE_MAX_MARK 0x{max_combining:04X} +""") + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Generate Unicode recomposition table") + parser.add_argument("--full", action="store_true", + help="Generate a full recomposition table (default: common pairs only)") + args = parser.parse_args() + + generate_recomposition_table(use_full_list=args.full) From patchwork Thu Apr 17 18:45:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882209 Received: from fout-b5-smtp.messagingengine.com (fout-b5-smtp.messagingengine.com [202.12.124.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 147E219ABC6; Thu, 17 Apr 2025 18:49:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.148 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915752; cv=none; b=tU0gNGytQ9dWFJVu6e+zuQVzvfa9Fc6RTDr40IeDyGvRTQN6tNi0BE+Z+ofsCOfuddhh17quzPZwac9kXh4jgoWuGmU1JQTqHgl/LS9Clv6jOuw+j7AAZySVypvVC+5v2X3F9pVw7g2vP5pf/4Y2DU4oJ7gXqlPQvaoMTgfkZCE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915752; c=relaxed/simple; bh=rUStBhaJZo62KGaIdMZrh1mrFObHiOrl6QxR9GZBl5M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iSRdsqp+J9KOVUX2l86Nwi7oUHmUsE0Ab3HZiTrmRxUarl9ldFaP/kznzzGPEXscZPs8SB3TVwh4NEGQmgqxnETNITANL+dk2Jtwt62SugmdCPp8zdjiOg3YZ1lQ8b6T0mIKeikP0HPoUu7gGxS/YPV9plwOoplW2QeOOA3oNFI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=iTohIEQv; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=PphzqQZH; arc=none smtp.client-ip=202.12.124.148 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="iTohIEQv"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="PphzqQZH" Received: from phl-compute-04.internal (phl-compute-04.phl.internal [10.202.2.44]) by mailfout.stl.internal (Postfix) with ESMTP id 1F97211400DC; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-04.internal (MEProxy); Thu, 17 Apr 2025 14:49:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915748; x= 1745002148; bh=F0YeLrJtygHFU6nH568VzC5840TAdlTewergicJA8qM=; b=i TohIEQvVhzUyG3Ej5JPnYx5kGskfIdWWQXaWkNKrRJLjRLNx/TjA1rUL7pcLFurT CxgtJUvYKJ6jpSSrnal+1GG+kdhh/nMqZphMqIqhSIPOQ9aXp98707LwUHojrCeI C6thXStDgsIm8sEdiChUI8/Grzfy9UGQtbrx/7nukNuMR2po/VlcpAYmlT8aXrn0 eiynOVhWb8fm12hzbFDjRkZhhkRi7tvWn4ni11ep21wBazxfvMZfOBLU2SEsNKOp kDYqEqjod4C0rLuAgTiVGF59QOfq7rTjVWBpVIENRqH3Ksp4VylnfMHbKlZf5S/o y5OL7ABugZDfarb61bpmg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915748; x=1745002148; bh=F 0YeLrJtygHFU6nH568VzC5840TAdlTewergicJA8qM=; b=PphzqQZH0htYaa7GA a7xRdjGIGQEJ6LqnlfbpLw0hZHdUGkphtzCMWPl98MVl67QieiGoqjzgyisCvwLQ ESbtYkOEVProDToSd1Nd8YFV4Ob8rzSuYiAKRIVo43PgU5wtPTSd/1ukHwn9w3uE f2ar5HfmMhkVv/NopFzKIVg+zAM7JItgkejQinMiD8I+jS3OKW66rjQ6/T47TPP6 2s1KApXTb7DPWQ1A5HkCD3elI5Os3vfYx/yR3OKyaDN5PYo/9Ji7R1x4gGFazHic YSOTYdMx3YrIgmrG+jh9EiBc3ziBzDKI4e2gClSa/uxMukhBE9t3zBoogKahKB+r c6nyQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id BE51611443D3; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 08/14] vt: create ucs_recompose_table.h with gen_ucs_recompose_table.py Date: Thu, 17 Apr 2025 14:45:10 -0400 Message-ID: <20250417184849.475581-9-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Table of base character + combining mark pairs with their precomposed equivalents. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/ucs_recompose_table.h | 102 +++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 drivers/tty/vt/ucs_recompose_table.h diff --git a/drivers/tty/vt/ucs_recompose_table.h b/drivers/tty/vt/ucs_recompose_table.h new file mode 100644 index 0000000000..bd91edde5d --- /dev/null +++ b/drivers/tty/vt/ucs_recompose_table.h @@ -0,0 +1,102 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * ucs_recompose_table.h - Unicode character recomposition + * + * Auto-generated by gen_ucs_recompose_table.py + * + * Unicode Version: 16.0.0 + * + * This file contains a table with most commonly used Latin, Greek, and + * Cyrillic recomposition pairs only (71 entries). To generate a table with + * all possible recomposition pairs from the Unicode BMP (1000 entries) + * instead, run: + * + * python gen_ucs_recompose_table.py --full + */ + +/* + * Table of most commonly used Latin, Greek, and Cyrillic recomposition pairs only + * Sorted by base character and then combining mark for binary search + */ +static const struct ucs_recomposition ucs_recomposition_table[] = { + { 0x0041, 0x0300, 0x00C0 }, /* LATIN CAPITAL LETTER A + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER A WITH GRAVE */ + { 0x0041, 0x0301, 0x00C1 }, /* LATIN CAPITAL LETTER A + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER A WITH ACUTE */ + { 0x0041, 0x0302, 0x00C2 }, /* LATIN CAPITAL LETTER A + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER A WITH CIRCUMFLEX */ + { 0x0041, 0x0303, 0x00C3 }, /* LATIN CAPITAL LETTER A + COMBINING TILDE = LATIN CAPITAL LETTER A WITH TILDE */ + { 0x0041, 0x0308, 0x00C4 }, /* LATIN CAPITAL LETTER A + COMBINING DIAERESIS = LATIN CAPITAL LETTER A WITH DIAERESIS */ + { 0x0041, 0x030A, 0x00C5 }, /* LATIN CAPITAL LETTER A + COMBINING RING ABOVE = LATIN CAPITAL LETTER A WITH RING ABOVE */ + { 0x0043, 0x0327, 0x00C7 }, /* LATIN CAPITAL LETTER C + COMBINING CEDILLA = LATIN CAPITAL LETTER C WITH CEDILLA */ + { 0x0045, 0x0300, 0x00C8 }, /* LATIN CAPITAL LETTER E + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER E WITH GRAVE */ + { 0x0045, 0x0301, 0x00C9 }, /* LATIN CAPITAL LETTER E + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER E WITH ACUTE */ + { 0x0045, 0x0302, 0x00CA }, /* LATIN CAPITAL LETTER E + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER E WITH CIRCUMFLEX */ + { 0x0045, 0x0308, 0x00CB }, /* LATIN CAPITAL LETTER E + COMBINING DIAERESIS = LATIN CAPITAL LETTER E WITH DIAERESIS */ + { 0x0049, 0x0300, 0x00CC }, /* LATIN CAPITAL LETTER I + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER I WITH GRAVE */ + { 0x0049, 0x0301, 0x00CD }, /* LATIN CAPITAL LETTER I + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER I WITH ACUTE */ + { 0x0049, 0x0302, 0x00CE }, /* LATIN CAPITAL LETTER I + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER I WITH CIRCUMFLEX */ + { 0x0049, 0x0308, 0x00CF }, /* LATIN CAPITAL LETTER I + COMBINING DIAERESIS = LATIN CAPITAL LETTER I WITH DIAERESIS */ + { 0x004E, 0x0303, 0x00D1 }, /* LATIN CAPITAL LETTER N + COMBINING TILDE = LATIN CAPITAL LETTER N WITH TILDE */ + { 0x004F, 0x0300, 0x00D2 }, /* LATIN CAPITAL LETTER O + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER O WITH GRAVE */ + { 0x004F, 0x0301, 0x00D3 }, /* LATIN CAPITAL LETTER O + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER O WITH ACUTE */ + { 0x004F, 0x0302, 0x00D4 }, /* LATIN CAPITAL LETTER O + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER O WITH CIRCUMFLEX */ + { 0x004F, 0x0303, 0x00D5 }, /* LATIN CAPITAL LETTER O + COMBINING TILDE = LATIN CAPITAL LETTER O WITH TILDE */ + { 0x004F, 0x0308, 0x00D6 }, /* LATIN CAPITAL LETTER O + COMBINING DIAERESIS = LATIN CAPITAL LETTER O WITH DIAERESIS */ + { 0x0055, 0x0300, 0x00D9 }, /* LATIN CAPITAL LETTER U + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER U WITH GRAVE */ + { 0x0055, 0x0301, 0x00DA }, /* LATIN CAPITAL LETTER U + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER U WITH ACUTE */ + { 0x0055, 0x0302, 0x00DB }, /* LATIN CAPITAL LETTER U + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER U WITH CIRCUMFLEX */ + { 0x0055, 0x0308, 0x00DC }, /* LATIN CAPITAL LETTER U + COMBINING DIAERESIS = LATIN CAPITAL LETTER U WITH DIAERESIS */ + { 0x0059, 0x0301, 0x00DD }, /* LATIN CAPITAL LETTER Y + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER Y WITH ACUTE */ + { 0x0061, 0x0300, 0x00E0 }, /* LATIN SMALL LETTER A + COMBINING GRAVE ACCENT = LATIN SMALL LETTER A WITH GRAVE */ + { 0x0061, 0x0301, 0x00E1 }, /* LATIN SMALL LETTER A + COMBINING ACUTE ACCENT = LATIN SMALL LETTER A WITH ACUTE */ + { 0x0061, 0x0302, 0x00E2 }, /* LATIN SMALL LETTER A + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER A WITH CIRCUMFLEX */ + { 0x0061, 0x0303, 0x00E3 }, /* LATIN SMALL LETTER A + COMBINING TILDE = LATIN SMALL LETTER A WITH TILDE */ + { 0x0061, 0x0308, 0x00E4 }, /* LATIN SMALL LETTER A + COMBINING DIAERESIS = LATIN SMALL LETTER A WITH DIAERESIS */ + { 0x0061, 0x030A, 0x00E5 }, /* LATIN SMALL LETTER A + COMBINING RING ABOVE = LATIN SMALL LETTER A WITH RING ABOVE */ + { 0x0063, 0x0327, 0x00E7 }, /* LATIN SMALL LETTER C + COMBINING CEDILLA = LATIN SMALL LETTER C WITH CEDILLA */ + { 0x0065, 0x0300, 0x00E8 }, /* LATIN SMALL LETTER E + COMBINING GRAVE ACCENT = LATIN SMALL LETTER E WITH GRAVE */ + { 0x0065, 0x0301, 0x00E9 }, /* LATIN SMALL LETTER E + COMBINING ACUTE ACCENT = LATIN SMALL LETTER E WITH ACUTE */ + { 0x0065, 0x0302, 0x00EA }, /* LATIN SMALL LETTER E + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER E WITH CIRCUMFLEX */ + { 0x0065, 0x0308, 0x00EB }, /* LATIN SMALL LETTER E + COMBINING DIAERESIS = LATIN SMALL LETTER E WITH DIAERESIS */ + { 0x0069, 0x0300, 0x00EC }, /* LATIN SMALL LETTER I + COMBINING GRAVE ACCENT = LATIN SMALL LETTER I WITH GRAVE */ + { 0x0069, 0x0301, 0x00ED }, /* LATIN SMALL LETTER I + COMBINING ACUTE ACCENT = LATIN SMALL LETTER I WITH ACUTE */ + { 0x0069, 0x0302, 0x00EE }, /* LATIN SMALL LETTER I + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER I WITH CIRCUMFLEX */ + { 0x0069, 0x0308, 0x00EF }, /* LATIN SMALL LETTER I + COMBINING DIAERESIS = LATIN SMALL LETTER I WITH DIAERESIS */ + { 0x006E, 0x0303, 0x00F1 }, /* LATIN SMALL LETTER N + COMBINING TILDE = LATIN SMALL LETTER N WITH TILDE */ + { 0x006F, 0x0300, 0x00F2 }, /* LATIN SMALL LETTER O + COMBINING GRAVE ACCENT = LATIN SMALL LETTER O WITH GRAVE */ + { 0x006F, 0x0301, 0x00F3 }, /* LATIN SMALL LETTER O + COMBINING ACUTE ACCENT = LATIN SMALL LETTER O WITH ACUTE */ + { 0x006F, 0x0302, 0x00F4 }, /* LATIN SMALL LETTER O + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER O WITH CIRCUMFLEX */ + { 0x006F, 0x0303, 0x00F5 }, /* LATIN SMALL LETTER O + COMBINING TILDE = LATIN SMALL LETTER O WITH TILDE */ + { 0x006F, 0x0308, 0x00F6 }, /* LATIN SMALL LETTER O + COMBINING DIAERESIS = LATIN SMALL LETTER O WITH DIAERESIS */ + { 0x0075, 0x0300, 0x00F9 }, /* LATIN SMALL LETTER U + COMBINING GRAVE ACCENT = LATIN SMALL LETTER U WITH GRAVE */ + { 0x0075, 0x0301, 0x00FA }, /* LATIN SMALL LETTER U + COMBINING ACUTE ACCENT = LATIN SMALL LETTER U WITH ACUTE */ + { 0x0075, 0x0302, 0x00FB }, /* LATIN SMALL LETTER U + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER U WITH CIRCUMFLEX */ + { 0x0075, 0x0308, 0x00FC }, /* LATIN SMALL LETTER U + COMBINING DIAERESIS = LATIN SMALL LETTER U WITH DIAERESIS */ + { 0x0079, 0x0301, 0x00FD }, /* LATIN SMALL LETTER Y + COMBINING ACUTE ACCENT = LATIN SMALL LETTER Y WITH ACUTE */ + { 0x0079, 0x0308, 0x00FF }, /* LATIN SMALL LETTER Y + COMBINING DIAERESIS = LATIN SMALL LETTER Y WITH DIAERESIS */ + { 0x0391, 0x0301, 0x0386 }, /* GREEK CAPITAL LETTER ALPHA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ALPHA WITH TONOS */ + { 0x0395, 0x0301, 0x0388 }, /* GREEK CAPITAL LETTER EPSILON + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER EPSILON WITH TONOS */ + { 0x0397, 0x0301, 0x0389 }, /* GREEK CAPITAL LETTER ETA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ETA WITH TONOS */ + { 0x0399, 0x0301, 0x038A }, /* GREEK CAPITAL LETTER IOTA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER IOTA WITH TONOS */ + { 0x039F, 0x0301, 0x038C }, /* GREEK CAPITAL LETTER OMICRON + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMICRON WITH TONOS */ + { 0x03A5, 0x0301, 0x038E }, /* GREEK CAPITAL LETTER UPSILON + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER UPSILON WITH TONOS */ + { 0x03A9, 0x0301, 0x038F }, /* GREEK CAPITAL LETTER OMEGA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMEGA WITH TONOS */ + { 0x03B1, 0x0301, 0x03AC }, /* GREEK SMALL LETTER ALPHA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ALPHA WITH TONOS */ + { 0x03B5, 0x0301, 0x03AD }, /* GREEK SMALL LETTER EPSILON + COMBINING ACUTE ACCENT = GREEK SMALL LETTER EPSILON WITH TONOS */ + { 0x03B7, 0x0301, 0x03AE }, /* GREEK SMALL LETTER ETA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ETA WITH TONOS */ + { 0x03B9, 0x0301, 0x03AF }, /* GREEK SMALL LETTER IOTA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER IOTA WITH TONOS */ + { 0x03BF, 0x0301, 0x03CC }, /* GREEK SMALL LETTER OMICRON + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMICRON WITH TONOS */ + { 0x03C5, 0x0301, 0x03CD }, /* GREEK SMALL LETTER UPSILON + COMBINING ACUTE ACCENT = GREEK SMALL LETTER UPSILON WITH TONOS */ + { 0x03C9, 0x0301, 0x03CE }, /* GREEK SMALL LETTER OMEGA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMEGA WITH TONOS */ + { 0x0418, 0x0306, 0x0419 }, /* CYRILLIC CAPITAL LETTER I + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT I */ + { 0x0423, 0x0306, 0x040E }, /* CYRILLIC CAPITAL LETTER U + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT U */ + { 0x0438, 0x0306, 0x0439 }, /* CYRILLIC SMALL LETTER I + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT I */ + { 0x0443, 0x0306, 0x045E }, /* CYRILLIC SMALL LETTER U + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT U */ +}; + +/* + * Boundary values for quick rejection + * These are calculated by analyzing the table during generation + */ +#define UCS_RECOMPOSE_MIN_BASE 0x0041 +#define UCS_RECOMPOSE_MAX_BASE 0x0443 +#define UCS_RECOMPOSE_MIN_MARK 0x0300 +#define UCS_RECOMPOSE_MAX_MARK 0x0327 From patchwork Thu Apr 17 18:45:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882208 Received: from fout-b5-smtp.messagingengine.com (fout-b5-smtp.messagingengine.com [202.12.124.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1460825525D; Thu, 17 Apr 2025 18:49:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.148 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915752; cv=none; b=EX9LYDXgOLRvRkKQr1JIcE2ZUx0aWlfxcC95uvdRAsLTchJqC4FhIzUZvM92EWJn7r3XMHR3gSx9+x+5FVIwt8ojNIy8sGWnlL8SOy4iYtrlo0wXt/zA1WHyVy4zIYMPbyCs8xXwdarhcio/iKlE7YqSfvlPPDVl2f9GmzJD2O4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915752; c=relaxed/simple; bh=92EDpQGpYpkVOEkCQSKmP2HkQnCuwdJCwXt7xWBGs40=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zwm/e87tgdxpL/zOTbtCFR9wInRPcR7l19KwQkWenHpdC05JY1C5ifdvebmyKI442DCky7kbHQoMtjl9xbV5w38+X2Dx/tHp7tPPj2WM7jNUswBfRqhTutlkyV1vEVu8KafVTOav7aHFCFcetDxUhj8YEAtfx/biRarTiWcX2e0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=okEOOUTk; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=A0Jgtvs5; arc=none smtp.client-ip=202.12.124.148 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="okEOOUTk"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="A0Jgtvs5" Received: from phl-compute-03.internal (phl-compute-03.phl.internal [10.202.2.43]) by mailfout.stl.internal (Postfix) with ESMTP id 1CB3011400DB; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-03.internal (MEProxy); Thu, 17 Apr 2025 14:49:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915748; x= 1745002148; bh=S34coVXmtNFMq7RzmeuThti+HGlleP/xQrm4SlS8LX0=; b=o kEOOUTklZPGeYPsa8gACdwb+LlB919tXuhRFlTM+w7M/xrWuS02+AcvkfPciZJfC Um4W17LstJIKTeglW61wC9XOaJBNEMBDR5f/qRIHEHgMzudZ8XUs3ziOkYHVuouX +g/dW4G7D/k7Cn7a4iqvv+JS0llwKpNyFtD9CGYyDZFidEgfEmHeGeksV7pZcwjK WmnyBUDowHKzB+OY3qkNOkfS0XprlbO+6/ZaB1cOJNEulaTZF31VaQyD2/bck7Cr cWg44I2sJKQ//U5fQJuBJOXiNV8jzqsoRRrP1nVnzHEDsKwSqUv0/SVpnQlk/rjM QguZabCfk5662ALVGMTIg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915748; x=1745002148; bh=S 34coVXmtNFMq7RzmeuThti+HGlleP/xQrm4SlS8LX0=; b=A0Jgtvs53g9CVnt3P bVxg2YoSFgTP1T2dzCTM30hFgTcaMmy4tB6gpSHvWx+XcNUarDQhj50ow52rISBu Pwkfhl39fpuzlGzskK7kWWellSXSOAges0JGyixx63mj1wiEcPJSt3Ni5gGxTD7s VCvtHJ7km9Gs/jqNbqLOvHdectyqDkFMXKIfr4woc/McZPlCQBLLS02PVlFbh/Dd YXaR3QkxFy/ExBmAw40dBG6KpA4gEBTfB8ruRYEe4FCcVB20XPgufv4NrxIdchPB 4ozQ0TJmgE5c36oCmNTIUv+jlQZJLYkZitU+3ZiMzjeb32GhcQBrm7wNEqUFb/Lc zlCgg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id DB41211443D4; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 09/14] vt: support Unicode recomposition Date: Thu, 17 Apr 2025 14:45:11 -0400 Message-ID: <20250417184849.475581-10-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Try replacing any decomposed Unicode sequence by the corresponding recomposed code point. Code point to glyph correspondance works best after recomposition, and this apply mostly to single-width code points therefore we can't preserve them in their decomposed form anyway. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/ucs.c | 62 ++++++++++++++++++++++++++++++++++++++ drivers/tty/vt/vt.c | 14 +++++++-- include/linux/consolemap.h | 6 ++++ 3 files changed, 79 insertions(+), 3 deletions(-) diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c index 5f9f25bd20..bf25d63cea 100644 --- a/drivers/tty/vt/ucs.c +++ b/drivers/tty/vt/ucs.c @@ -59,3 +59,65 @@ bool ucs_is_double_width(u32 cp) return cp_in_range(cp, ucs_double_width_ranges, ARRAY_SIZE(ucs_double_width_ranges)); } + +/* + * Structure for base with combining mark pairs and resulting recompositions. + * Using u16 to save space since all values are within BMP range. + */ +struct ucs_recomposition { + u16 base; /* base character */ + u16 mark; /* combining mark */ + u16 recomposed; /* corresponding recomposed character */ +}; + +#include "ucs_recompose_table.h" + +struct compare_key { + u16 base; + u16 mark; +}; + +static int recomposition_cmp(const void *key, const void *element) +{ + const struct compare_key *search_key = key; + const struct ucs_recomposition *entry = element; + + /* Compare base character first */ + if (search_key->base < entry->base) + return -1; + if (search_key->base > entry->base) + return 1; + + /* Base characters match, now compare combining character */ + if (search_key->mark < entry->mark) + return -1; + if (search_key->mark > entry->mark) + return 1; + + /* Both match */ + return 0; +} + +/** + * ucs_recompose() - Attempt to recompose two Unicode characters into a single character. + * @base: Base Unicode code point (UCS-4) + * @mark: Combining mark Unicode code point (UCS-4) + * + * Return: Recomposed Unicode code point, or 0 if no recomposition is possible + */ +u32 ucs_recompose(u32 base, u32 mark) +{ + /* Check if characters are within the range of our table */ + if (!in_range(base, UCS_RECOMPOSE_MIN_BASE, UCS_RECOMPOSE_MAX_BASE) || + !in_range(mark, UCS_RECOMPOSE_MIN_MARK, UCS_RECOMPOSE_MAX_MARK)) + return 0; + + struct compare_key key = { base, mark }; + struct ucs_recomposition *result = + __inline_bsearch(&key, ucs_recomposition_table, + ARRAY_SIZE(ucs_recomposition_table), + sizeof(*ucs_recomposition_table), + recomposition_cmp); + + return result ? result->recomposed : 0; +} diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index a989feffad..76554c2040 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2925,9 +2925,9 @@ static void vc_con_rewind(struct vc_data *vc) #define UCS_VS16 0xfe0f /* Variation Selector 16 */ -static int vc_process_ucs(struct vc_data *vc, int c, int *tc) +static int vc_process_ucs(struct vc_data *vc, int *c, int *tc) { - u32 prev_c, curr_c = c; + u32 prev_c, curr_c = *c; if (ucs_is_double_width(curr_c)) return 2; @@ -2964,6 +2964,14 @@ static int vc_process_ucs(struct vc_data *vc, int c, int *tc) return 1; } + /* try recomposition */ + prev_c = ucs_recompose(prev_c, curr_c); + if (prev_c != 0) { + vc_con_rewind(vc); + *tc = *c = prev_c; + return 1; + } + /* Otherwise zero-width code points are ignored. */ return 0; } @@ -2978,7 +2986,7 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, bool inverse = false; if (vc->vc_utf && !vc->vc_disp_ctrl) { - width = vc_process_ucs(vc, c, &tc); + width = vc_process_ucs(vc, &c, &tc); if (!width) goto out; } diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index b3a9118666..8167494229 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -30,6 +30,7 @@ int conv_uni_to_8bit(u32 uni); void console_map_init(void); bool ucs_is_double_width(uint32_t cp); bool ucs_is_zero_width(uint32_t cp); +u32 ucs_recompose(u32 base, u32 mark); #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) @@ -69,6 +70,11 @@ static inline bool ucs_is_zero_width(uint32_t cp) { return false; } + +static inline u32 ucs_recompose(u32 base, u32 mark) +{ + return 0; +} #endif /* CONFIG_CONSOLE_TRANSLATIONS */ #endif /* __LINUX_CONSOLEMAP_H__ */ From patchwork Thu Apr 17 18:45:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882553 Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 357AE2566FE; Thu, 17 Apr 2025 18:49:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.155 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915754; cv=none; b=ZUv1PLDw9nWIFCKx1ByDbM3GngSGWzC1HEq4sePHCqychlACqCe45Mw9quCLqx9SovM4+W1hj5uvT8Jy5E67pmG7sIhWagRBtfLRdR2kR5KtpK5pS827NsieoGKGxyHtvE0Jtu598X0MESR87Tufduixe5xoh382tKs20LJ3bY0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915754; c=relaxed/simple; bh=azSeZh6pGjwVNUQ0XFxqbK4BRPqes2S8OJZmPp5wogo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A4/0Gqi8KxbmKGbwJmZk/Gnypo/Z1C7HaWpjsIVk+dsPTK7ObBosUb3wTVsKRQ4f0A6JJ+yFU3iGPmWrE2OgbFHi9s9wVS0l4dl6FFEMHcGFvjEX8O51I14WrEBwvFrgM9Rk14vqe5dyFnhtRlJqzcFKIXLJBUbo3HtdxI20a7Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=bQvl0Kfk; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=NXUb2F/U; arc=none smtp.client-ip=202.12.124.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="bQvl0Kfk"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="NXUb2F/U" Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfhigh.stl.internal (Postfix) with ESMTP id 1F88B254015A; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-01.internal (MEProxy); Thu, 17 Apr 2025 14:49:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915748; x= 1745002148; bh=99nVY6bnkGnbESwgNeAFXn5H5Vcb38/1+1jeQ+ArvBQ=; b=b Qvl0KfkbVH92hvrnwMr9hzDYp9VUOlFpcDna0Nsd1H4zisc3ZOggG/YjUv0wNspK FchtEjg3ScsUOFfnf1LNufBAo0MgZH+wvbXB9nAnm320LjxJFt54HSWhlFr/VKTu ylnQKJwASypmOQRZk4Fnh3bG6swSUxpSbjGB8FU6HLL8oFTVbG0yf2sQOMMo8Ygo Mk4QYcInk+MLqyvoGqLtm51vwg+vE7gqjRf9R35uVg2BPDBuDorrc+qGKrrcqxfL onw/XmBsYELm4yQGsRLm7lN1sl3O99QZaqZB4eb6yUAhwEduXCFwx+TJJXkMietg GdoreKRstwTOA7GEe0KYg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915748; x=1745002148; bh=9 9nVY6bnkGnbESwgNeAFXn5H5Vcb38/1+1jeQ+ArvBQ=; b=NXUb2F/UjMf+DPU5b dueMjkdHa6wNtaRSBYfl2ks3CpDR491micV/Vx9K0S5VypEgD3qbB4oaRJbkCdNd aVpF6dTPjaozOc32BRFV7P6rgQzk8qy/McmIjZQBL94hxAhHxmuKtJ46RbRZbyB7 9TUn9PEjRBNr28kajHAYJXkgaanDVzqx8aMl4FDFwy5WhQP2iCeLw0nGqPA7o7um 7wPLUhFrFxEmvY/HnxuV5Gv7D6GtrJ2TUj8S4RHzLYUod8ReXDW8f8hokYXKa8Bs whVJB7QgnZqhyysVHJ50DTMVu5akAj89jCcOcYukVyW/ccwgGa7cglLbMBxgdNAX W6u2g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id F0F7511443D5; Thu, 17 Apr 2025 14:49:07 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 10/14] vt: pad double-width code points with a zero-width space Date: Thu, 17 Apr 2025 14:45:12 -0400 Message-ID: <20250417184849.475581-11-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre In the Unicode screen buffer, we follow double-width code points with a space to maintain proper column alignment. This, however, creates semantic problems when e.g. using cut and paste. Let's use a better code point for the column padding's purpose i.e. a zero-width space rather than a full space. This way the combination retains a width of 2. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/vt.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 76554c2040..1bd1878094 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2923,6 +2923,7 @@ static void vc_con_rewind(struct vc_data *vc) vc->vc_need_wrap = 0; } +#define UCS_ZWS 0x200b /* Zero Width Space */ #define UCS_VS16 0xfe0f /* Variation Selector 16 */ static int vc_process_ucs(struct vc_data *vc, int *c, int *tc) @@ -2941,8 +2942,8 @@ static int vc_process_ucs(struct vc_data *vc, int *c, int *tc) /* * Let's merge this zero-width code point with the preceding * double-width code point by replacing the existing - * whitespace padding. To do so we rewind one column and - * pretend this has a width of 1. + * zero-width space padding. To do so we rewind one column + * and pretend this has a width of 1. * We give the legacy display the same initial space padding. */ vc_con_rewind(vc); @@ -3065,7 +3066,11 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, tc = conv_uni_to_pc(vc, ' '); if (tc < 0) tc = ' '; - next_c = ' '; + /* + * Store a zero-width space in the Unicode screen given that + * the previous code point is semantically double width. + */ + next_c = UCS_ZWS; } out: From patchwork Thu Apr 17 18:45:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882210 Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4F9F256C62; Thu, 17 Apr 2025 18:49:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.155 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915751; cv=none; b=hv8IJtFvsXTNySsjE7fgUIUNhMixqufI0+ncN05KLkfKmMrxAmHp1+cQNCER/BOALbb3PPi7G+alELdqGVaLFpmU9zW5WGF1bZki6jtovdA1/9yGBQKBGGenA5KnrhnOLlWaWc5/PMNmiQqrI5kehunqVeIxVeRadR+32tQfmCw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915751; c=relaxed/simple; bh=fm3eFyJozG2lS4Rb/hc1hRnxBHCnkLqI+2kI+zYTuII=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jrZ+0MJ7D17WkkYjs+9ieCslIGbLGzapXAe6kD7jORCYMCPMD57hU4PFW+cel5wNg2Y0tPOsaXoociFHMivsKgAL6n/mIjGVvZ13CzDP5bbBnYk8OjkJWsOF0uXk5g7YuF/uAoDXyIilgnQXAbPVUmbMbJtQduV7nFMHZmdb1zQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=jjmq5NdS; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=l4P2uOaf; arc=none smtp.client-ip=202.12.124.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="jjmq5NdS"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="l4P2uOaf" Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfhigh.stl.internal (Postfix) with ESMTP id 1715E254014A; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-01.internal (MEProxy); Thu, 17 Apr 2025 14:49:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915748; x= 1745002148; bh=xawSWFfcd32yMzp1Ryv7c3+npsnX3TTRHig5t/4hu6Y=; b=j jmq5NdSbe9BgguNAjklmxVSbuFdkV7zcp/WaHywvH+HwtkSji77CTgNfWiFsxgVn BYvLPdgxuV4j14EIKGQCVicupsyKXl7i6Mu7UT7kzOVHx08w6oBx6bT+zTftGeeS o/wsnNG9ilKnVTujS+0SO8nR9AnQw/R5AaF6XKLNhzTRNB2nD1SXZ+YvgQalK7Zs xpwjNce2okzD1yTUx3EP9xLMYUtQyYnFg27H/KCvZ7hK0No/kgEchYGwL9Ru0wzs u3eDxCxCknlaFrq0Hd6Hg615gX2r9pap5Flj/iZoHuDVhdkrfmcsSFcyRUwb5yrA R282XR6q8ks0nVlrluKUA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915748; x=1745002148; bh=x awSWFfcd32yMzp1Ryv7c3+npsnX3TTRHig5t/4hu6Y=; b=l4P2uOafNptS1wwEh yzSyhM/UVBgRIoWhF6MQLTU2Y1bLuJWVDDnm+6Xk0A1YYme8AVQO4WK3UHGHayQr nP4gUQDeiVjEK5nZWWlwaEO+daRar+XOKIJQFa3/mDRcaTER1P3F7K9+IPrcgow2 R1Q7tFwSrpqc1ELNXKZSJp6GOu2gb4Y8Jwy/sbN7ZuzLIcMoF94JtGrytr0bjaXt V+BeXUwDWpEzZmNve6OhZhVfdV6c0iMM1V7PMhBRtpi1dP/co2OZh4zBlZOqkd3R RxBma32KYWhipzgzlgNWDnL2z9auBtx6MhQ8lbF7yCSUig0pbn7qjKA3UyEfujWv TUyyw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 1A2C711443D7; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 11/14] vt: remove zero-width-space handling from conv_uni_to_pc() Date: Thu, 17 Apr 2025 14:45:13 -0400 Message-ID: <20250417184849.475581-12-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre This is now taken care of by ucs_is_zero_width(). Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/consolemap.c | 2 -- drivers/tty/vt/vt.c | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/tty/vt/consolemap.c b/drivers/tty/vt/consolemap.c index 82d70083fe..bb4bb272eb 100644 --- a/drivers/tty/vt/consolemap.c +++ b/drivers/tty/vt/consolemap.c @@ -870,8 +870,6 @@ int conv_uni_to_pc(struct vc_data *conp, long ucs) return -4; /* Not found */ else if (ucs < 0x20) return -1; /* Not a printable character */ - else if (ucs == 0xfeff || (ucs >= 0x200b && ucs <= 0x200f)) - return -2; /* Zero-width space */ /* * UNI_DIRECT_BASE indicates the start of the region in the User Zone * which always has a 1:1 mapping to the currently loaded font. The diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 1bd1878094..24c6cd2eed 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2995,7 +2995,7 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, /* Now try to find out how to display it */ tc = conv_uni_to_pc(vc, tc); if (tc & ~charmask) { - if (tc == -1 || tc == -2) + if (tc == -1) return -1; /* nothing to display */ /* Glyph not found */ From patchwork Thu Apr 17 18:45:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882206 Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8008F257449; Thu, 17 Apr 2025 18:49:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.155 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915754; cv=none; b=F/RCOCPMieL5gWPfbh15EdgPkm6JUdWktYzcqejNgjcXnx0Eb8NTuH4xNEGWu0r8wouvfPTMlyRxzub8LoSrfGBfEppIEmOvDeuxwvDBAFMwLRlBSndzZ+7PXYJHAdHo+4HZOwtp4of6bbD7tWYuOrA9c6DeEzLNvN7pKnHBr2A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915754; c=relaxed/simple; bh=KFMvkj61UsgzOT+/THAdX4dpcixoST3uT5J6ksm48HA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pw6fqmiAfd9TKzC4Se5eiakeMWvTsRbsvfFkxpmeGBg7RJer3hb0DfyNgmWXmLoQDmxc749dEKvlHMhIKDznALIKaHN5AQEKw4FnEZJL1oX4uy/V0Y9U4RvaI2Sfw1jJW2A4Lm+PZbPH0KIKxYMcMWuGmKh7OpLVvk/SVMFJhhs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=kSZY+OQ1; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=vgrBnXDm; arc=none smtp.client-ip=202.12.124.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="kSZY+OQ1"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="vgrBnXDm" Received: from phl-compute-03.internal (phl-compute-03.phl.internal [10.202.2.43]) by mailfhigh.stl.internal (Postfix) with ESMTP id 2A8D0254015C; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-03.internal (MEProxy); Thu, 17 Apr 2025 14:49:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915749; x= 1745002149; bh=/D46HuyxAMriUAr8SS9DKG5cJKlZiMImdwjE0WjHI4E=; b=k SZY+OQ1WjvQidsUBaoQJlqK36rIhB6cDuE4K9LieFg3GP7IFtR23+iLKdw9UCXMi Rg3YLiOv2zk+rV0l0awV+YAfY9/jGaeiRd8vC9e11lWHyDv1YsKQCzM7CfI7D89C ilt2UpAX583ln0POV/Hg/W7G+mxVdFzEOjp8u7lvO1sndsbekp98b995Mc+YVSnp 9t2aZRwwGtr+mf6PEVg6Ow809g1P5sIXC0Ke3ESrBpcYxnR4tj668GfNXguHQssa GMpUE4Drprp879LMGKeHt02qlGcKbZ9wqzbArOA4fav1a+t6ulRFddM8rW3QgyaR icm3rsSMVqAVeel0MBa/w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915749; x=1745002149; bh=/ D46HuyxAMriUAr8SS9DKG5cJKlZiMImdwjE0WjHI4E=; b=vgrBnXDma5v0pTxPU nJLT2QDODJNvL+XKwUMEbU5Qutx7Reaa2DTZ0F1YdkmGLo7i1ox6RDjsxMSB42oQ Xb0nvyVuHSIKyxiUu3kltxAqDympa8SQwGfUpkT9wRR2/PR8U8Y9xx8oJeuAy32z dUfOaez/X9EtE+Hac4oKJDQtbxTUU3bcAsbQq0UEairRz4Ga1uqEfFfXOPbaN3LK 49GtST4JUB+XaOOrX8A2sbM/0S2uGOHoOCGTszmPCu4ty+BvImpZH+dIywhSCXUk tRFNs3BmpUcwSg9cAKAOZjxP4BdjQijez0cSWl3qjxiIH9SN/zQ9ARnEBd9iBLNg T5ywQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 3549711443D8; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 12/14] vt: update gen_ucs_width_table.py to make tables more space efficient Date: Thu, 17 Apr 2025 14:45:14 -0400 Message-ID: <20250417184849.475581-13-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Split table ranges into BMP (16-bit) and non-BMP (above 16-bit). This reduces the corresponding text size by 20-25%. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/gen_ucs_width_table.py | 55 ++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/drivers/tty/vt/gen_ucs_width_table.py b/drivers/tty/vt/gen_ucs_width_table.py index 00510444a7..059ed9a8ba 100755 --- a/drivers/tty/vt/gen_ucs_width_table.py +++ b/drivers/tty/vt/gen_ucs_width_table.py @@ -194,6 +194,27 @@ def write_tables(zero_width_ranges, double_width_ranges): double_width_ranges: List of (start, end) ranges for double-width characters """ + # Function to split ranges into BMP (16-bit) and non-BMP (above 16-bit) + def split_ranges_by_size(ranges): + bmp_ranges = [] + non_bmp_ranges = [] + + for start, end in ranges: + if end <= 0xFFFF: + bmp_ranges.append((start, end)) + elif start > 0xFFFF: + non_bmp_ranges.append((start, end)) + else: + # Split the range at 0xFFFF + bmp_ranges.append((start, 0xFFFF)) + non_bmp_ranges.append((0x10000, end)) + + return bmp_ranges, non_bmp_ranges + + # Split ranges into BMP and non-BMP + zero_width_bmp, zero_width_non_bmp = split_ranges_by_size(zero_width_ranges) + double_width_bmp, double_width_non_bmp = split_ranges_by_size(double_width_ranges) + # Function to generate code point description comments def get_code_point_comment(start, end): try: @@ -221,22 +242,44 @@ def write_tables(zero_width_ranges, double_width_ranges): * Unicode Version: {unicodedata.unidata_version} */ -/* Zero-width character ranges */ -static const struct ucs_interval ucs_zero_width_ranges[] = {{ +/* Zero-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct ucs_interval16 ucs_zero_width_bmp_ranges[] = {{ +""") + + for start, end in zero_width_bmp: + comment = get_code_point_comment(start, end) + f.write(f"\t{{ 0x{start:04X}, 0x{end:04X} }}, {comment}\n") + + f.write("""\ +}; + +/* Zero-width character ranges (non-BMP, U+10000 and above) */ +static const struct ucs_interval32 ucs_zero_width_non_bmp_ranges[] = { """) - for start, end in zero_width_ranges: + for start, end in zero_width_non_bmp: comment = get_code_point_comment(start, end) f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") f.write("""\ }; -/* Double-width character ranges */ -static const struct ucs_interval ucs_double_width_ranges[] = { +/* Double-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct ucs_interval16 ucs_double_width_bmp_ranges[] = { +""") + + for start, end in double_width_bmp: + comment = get_code_point_comment(start, end) + f.write(f"\t{{ 0x{start:04X}, 0x{end:04X} }}, {comment}\n") + + f.write("""\ +}; + +/* Double-width character ranges (non-BMP, U+10000 and above) */ +static const struct ucs_interval32 ucs_double_width_non_bmp_ranges[] = { """) - for start, end in double_width_ranges: + for start, end in double_width_non_bmp: comment = get_code_point_comment(start, end) f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") From patchwork Thu Apr 17 18:45:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882552 Received: from fout-b5-smtp.messagingengine.com (fout-b5-smtp.messagingengine.com [202.12.124.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D0212571C6; Thu, 17 Apr 2025 18:49:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.148 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915755; cv=none; b=Tm+Dkq4Px0sVE44jFOkeLJlb+yRPf9zqkpuulr2ZjYTSpa1ToCmjigcCIMccBGDQ7K3LlDmAhUNYUNWlevxaJG43ZPVjjeQzfnTA76OOpR/HRYlgpcqd2DPsN9A0DBtguqzDoBMhEw/LpPirLRKS106vKzuyMB6EWwyVDlvgvT0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915755; c=relaxed/simple; bh=+R3n2sO03XLQ7Ts/XYOK9Nea3pTpU/qxKsdkNg8N6E4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uEkEWqiB0D5jjGjV7ngx7B70DLCY8HEDSYeTcIdJk0/aDuKmUk/P810pZE5q5jl87KJtEyRu+G3JzMfaWd2+3cGbkigNnjakhlUI0t4t8pP5Nw5r/7u9AtGVrQ7mj7cA4cgE/jzahLsixYq3KZWzj+I99PxDB7sgxc3elNmHGIE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=NvY6QJ3g; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=D488m1Hh; arc=none smtp.client-ip=202.12.124.148 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="NvY6QJ3g"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="D488m1Hh" Received: from phl-compute-06.internal (phl-compute-06.phl.internal [10.202.2.46]) by mailfout.stl.internal (Postfix) with ESMTP id 6E5B41140114; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-06.internal (MEProxy); Thu, 17 Apr 2025 14:49:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915749; x= 1745002149; bh=nX3javFQnk2sFkV0gB5s1zxb8S+/InxYZvhI55qEvYo=; b=N vY6QJ3gcodS+fzlRt1lZjI/PSvaYancCh/QjpKwojQNHuTkENb8ZuhuzvkbdKHyG u0E8XWdGHO/DiLVyC2SxKrK18rFiW69WEoWOdawXrXjuF0cWE3k7hR66nOsbbqK4 gdttgHlk1JUANU/opDhGwxo6kb4k2S85mpGclFVUeyBgKiblauGn0luYxl/8rtpg XlXd6T584dO08Snu+2vs1IZvb53XTGUPDX9E3cyfylFlZIPU6kEs2Ni0nXtM31ED lSEu0DZpkeIWt6dXwm4n4qebRCd25Y2dNKfHe2Rw2zOVWfWHWZWrQXXUxsRejKB1 p0ZSkCZogmiyEZ6B4J/WA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915749; x=1745002149; bh=n X3javFQnk2sFkV0gB5s1zxb8S+/InxYZvhI55qEvYo=; b=D488m1HhzuOKvAS49 +9Xk5N8bKXR/KfsvdRo91mld28RRT0vPqdlp1qKO1N8SELmByJ9Tt3Oc84q1OYqZ ipZXharBSC5syOpNnfjkVDqXWM5pdMgmP1VzNYmQujk+30Nv4AiSODr1/FWNJPfV iDADoGoF2I6UjtHd+KE5tnGBKRxZRPChEgTG22GJ3wGtfKWB3buTmSnl7STpmu3M zV/WF2TBD9W0DwWio4SuNeG3ec4WuZaLN5pjtKsohwviETopV5HwWYzswZybGKo4 FvB50RyqGRA6xV4lAPsOA7wFc26JPc9/oAoN1aWWIDZebkraYb9gaobldW3/xm3J vAlcg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttdefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 5247B11443DB; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 13/14] vt: refresh ucs_width_table.h and adjust code in ucs.c accordingly Date: Thu, 17 Apr 2025 14:45:15 -0400 Message-ID: <20250417184849.475581-14-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Width tables are now split into BMP (16-bit) and non-BMP (above 16-bit). This reduces the corresponding text size by 20-25%. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby --- drivers/tty/vt/ucs.c | 54 +++- drivers/tty/vt/ucs_width_table.h | 540 ++++++++++++++++--------------- 2 files changed, 319 insertions(+), 275 deletions(-) diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c index bf25d63cea..0b58cb7344 100644 --- a/drivers/tty/vt/ucs.c +++ b/drivers/tty/vt/ucs.c @@ -8,17 +8,34 @@ #include #include -struct ucs_interval { +struct ucs_interval16 { + u16 first; + u16 last; +}; + +struct ucs_interval32 { u32 first; u32 last; }; #include "ucs_width_table.h" -static int interval_cmp(const void *key, const void *element) +static int interval16_cmp(const void *key, const void *element) +{ + u16 cp = *(u16 *)key; + const struct ucs_interval16 *entry = element; + + if (cp < entry->first) + return -1; + if (cp > entry->last) + return 1; + return 0; +} + +static int interval32_cmp(const void *key, const void *element) { u32 cp = *(u32 *)key; - const struct ucs_interval *entry = element; + const struct ucs_interval32 *entry = element; if (cp < entry->first) return -1; @@ -27,15 +44,26 @@ static int interval_cmp(const void *key, const void *element) return 0; } -static bool cp_in_range(u32 cp, const struct ucs_interval *ranges, size_t size) +static bool cp_in_range16(u16 cp, const struct ucs_interval16 *ranges, size_t size) { if (!in_range(cp, ranges[0].first, ranges[size - 1].last)) return false; return __inline_bsearch(&cp, ranges, size, sizeof(*ranges), - interval_cmp) != NULL; + interval16_cmp) != NULL; } +static bool cp_in_range32(u32 cp, const struct ucs_interval32 *ranges, size_t size) +{ + if (!in_range(cp, ranges[0].first, ranges[size - 1].last)) + return false; + + return __inline_bsearch(&cp, ranges, size, sizeof(*ranges), + interval32_cmp) != NULL; +} + +#define UCS_IS_BMP(cp) ((cp) <= 0xffff) + /** * ucs_is_zero_width() - Determine if a Unicode code point is zero-width. * @cp: Unicode code point (UCS-4) @@ -44,8 +72,12 @@ static bool cp_in_range(u32 cp, const struct ucs_interval *ranges, size_t size) */ bool ucs_is_zero_width(u32 cp) { - return cp_in_range(cp, ucs_zero_width_ranges, - ARRAY_SIZE(ucs_zero_width_ranges)); + if (UCS_IS_BMP(cp)) + return cp_in_range16(cp, ucs_zero_width_bmp_ranges, + ARRAY_SIZE(ucs_zero_width_bmp_ranges)); + else + return cp_in_range32(cp, ucs_zero_width_non_bmp_ranges, + ARRAY_SIZE(ucs_zero_width_non_bmp_ranges)); } /** @@ -56,8 +88,12 @@ bool ucs_is_zero_width(u32 cp) */ bool ucs_is_double_width(u32 cp) { - return cp_in_range(cp, ucs_double_width_ranges, - ARRAY_SIZE(ucs_double_width_ranges)); + if (UCS_IS_BMP(cp)) + return cp_in_range16(cp, ucs_double_width_bmp_ranges, + ARRAY_SIZE(ucs_double_width_bmp_ranges)); + else + return cp_in_range32(cp, ucs_double_width_non_bmp_ranges, + ARRAY_SIZE(ucs_double_width_non_bmp_ranges)); } /* diff --git a/drivers/tty/vt/ucs_width_table.h b/drivers/tty/vt/ucs_width_table.h index 9cc86b5cdf..6fcb8f1d57 100644 --- a/drivers/tty/vt/ucs_width_table.h +++ b/drivers/tty/vt/ucs_width_table.h @@ -7,210 +7,214 @@ * Unicode Version: 16.0.0 */ -/* Zero-width character ranges */ -static const struct ucs_interval ucs_zero_width_ranges[] = { - { 0x000AD, 0x000AD }, /* SOFT HYPHEN */ - { 0x00300, 0x0036F }, /* COMBINING GRAVE ACCENT - COMBINING LATIN SMALL LETTER X */ - { 0x00483, 0x00489 }, /* COMBINING CYRILLIC TITLO - COMBINING CYRILLIC MILLIONS SIGN */ - { 0x00591, 0x005BD }, /* HEBREW ACCENT ETNAHTA - HEBREW POINT METEG */ - { 0x005BF, 0x005BF }, /* HEBREW POINT RAFE */ - { 0x005C1, 0x005C2 }, /* HEBREW POINT SHIN DOT - HEBREW POINT SIN DOT */ - { 0x005C4, 0x005C5 }, /* HEBREW MARK UPPER DOT - HEBREW MARK LOWER DOT */ - { 0x005C7, 0x005C7 }, /* HEBREW POINT QAMATS QATAN */ - { 0x00600, 0x00605 }, /* ARABIC NUMBER SIGN - ARABIC NUMBER MARK ABOVE */ - { 0x00610, 0x0061A }, /* ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM - ARABIC SMALL KASRA */ - { 0x0061C, 0x0061C }, /* ARABIC LETTER MARK */ - { 0x0064B, 0x0065F }, /* ARABIC FATHATAN - ARABIC WAVY HAMZA BELOW */ - { 0x00670, 0x00670 }, /* ARABIC LETTER SUPERSCRIPT ALEF */ - { 0x006D6, 0x006DD }, /* ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA - ARABIC END OF AYAH */ - { 0x006DF, 0x006E4 }, /* ARABIC SMALL HIGH ROUNDED ZERO - ARABIC SMALL HIGH MADDA */ - { 0x006E7, 0x006E8 }, /* ARABIC SMALL HIGH YEH - ARABIC SMALL HIGH NOON */ - { 0x006EA, 0x006ED }, /* ARABIC EMPTY CENTRE LOW STOP - ARABIC SMALL LOW MEEM */ - { 0x0070F, 0x0070F }, /* SYRIAC ABBREVIATION MARK */ - { 0x00711, 0x00711 }, /* SYRIAC LETTER SUPERSCRIPT ALAPH */ - { 0x00730, 0x0074A }, /* SYRIAC PTHAHA ABOVE - SYRIAC BARREKH */ - { 0x007A6, 0x007B0 }, /* THAANA ABAFILI - THAANA SUKUN */ - { 0x007EB, 0x007F3 }, /* NKO COMBINING SHORT HIGH TONE - NKO COMBINING DOUBLE DOT ABOVE */ - { 0x007FD, 0x007FD }, /* NKO DANTAYALAN */ - { 0x00816, 0x00819 }, /* SAMARITAN MARK IN - SAMARITAN MARK DAGESH */ - { 0x0081B, 0x00823 }, /* SAMARITAN MARK EPENTHETIC YUT - SAMARITAN VOWEL SIGN A */ - { 0x00825, 0x00827 }, /* SAMARITAN VOWEL SIGN SHORT A - SAMARITAN VOWEL SIGN U */ - { 0x00829, 0x0082D }, /* SAMARITAN VOWEL SIGN LONG I - SAMARITAN MARK NEQUDAA */ - { 0x00859, 0x0085B }, /* MANDAIC AFFRICATION MARK - MANDAIC GEMINATION MARK */ - { 0x00890, 0x00891 }, /* ARABIC POUND MARK ABOVE - ARABIC PIASTRE MARK ABOVE */ - { 0x00897, 0x0089F }, /* ARABIC PEPET - ARABIC HALF MADDA OVER MADDA */ - { 0x008CA, 0x00903 }, /* ARABIC SMALL HIGH FARSI YEH - DEVANAGARI SIGN VISARGA */ - { 0x0093A, 0x0093C }, /* DEVANAGARI VOWEL SIGN OE - DEVANAGARI SIGN NUKTA */ - { 0x0093E, 0x0094F }, /* DEVANAGARI VOWEL SIGN AA - DEVANAGARI VOWEL SIGN AW */ - { 0x00951, 0x00957 }, /* DEVANAGARI STRESS SIGN UDATTA - DEVANAGARI VOWEL SIGN UUE */ - { 0x00962, 0x00963 }, /* DEVANAGARI VOWEL SIGN VOCALIC L - DEVANAGARI VOWEL SIGN VOCALIC LL */ - { 0x00981, 0x00983 }, /* BENGALI SIGN CANDRABINDU - BENGALI SIGN VISARGA */ - { 0x009BC, 0x009BC }, /* BENGALI SIGN NUKTA */ - { 0x009BE, 0x009C4 }, /* BENGALI VOWEL SIGN AA - BENGALI VOWEL SIGN VOCALIC RR */ - { 0x009C7, 0x009C8 }, /* BENGALI VOWEL SIGN E - BENGALI VOWEL SIGN AI */ - { 0x009CB, 0x009CD }, /* BENGALI VOWEL SIGN O - BENGALI SIGN VIRAMA */ - { 0x009D7, 0x009D7 }, /* BENGALI AU LENGTH MARK */ - { 0x009E2, 0x009E3 }, /* BENGALI VOWEL SIGN VOCALIC L - BENGALI VOWEL SIGN VOCALIC LL */ - { 0x009FE, 0x009FE }, /* BENGALI SANDHI MARK */ - { 0x00A01, 0x00A03 }, /* GURMUKHI SIGN ADAK BINDI - GURMUKHI SIGN VISARGA */ - { 0x00A3C, 0x00A3C }, /* GURMUKHI SIGN NUKTA */ - { 0x00A3E, 0x00A42 }, /* GURMUKHI VOWEL SIGN AA - GURMUKHI VOWEL SIGN UU */ - { 0x00A47, 0x00A48 }, /* GURMUKHI VOWEL SIGN EE - GURMUKHI VOWEL SIGN AI */ - { 0x00A4B, 0x00A4D }, /* GURMUKHI VOWEL SIGN OO - GURMUKHI SIGN VIRAMA */ - { 0x00A51, 0x00A51 }, /* GURMUKHI SIGN UDAAT */ - { 0x00A70, 0x00A71 }, /* GURMUKHI TIPPI - GURMUKHI ADDAK */ - { 0x00A75, 0x00A75 }, /* GURMUKHI SIGN YAKASH */ - { 0x00A81, 0x00A83 }, /* GUJARATI SIGN CANDRABINDU - GUJARATI SIGN VISARGA */ - { 0x00ABC, 0x00ABC }, /* GUJARATI SIGN NUKTA */ - { 0x00ABE, 0x00AC5 }, /* GUJARATI VOWEL SIGN AA - GUJARATI VOWEL SIGN CANDRA E */ - { 0x00AC7, 0x00AC9 }, /* GUJARATI VOWEL SIGN E - GUJARATI VOWEL SIGN CANDRA O */ - { 0x00ACB, 0x00ACD }, /* GUJARATI VOWEL SIGN O - GUJARATI SIGN VIRAMA */ - { 0x00AE2, 0x00AE3 }, /* GUJARATI VOWEL SIGN VOCALIC L - GUJARATI VOWEL SIGN VOCALIC LL */ - { 0x00AFA, 0x00AFF }, /* GUJARATI SIGN SUKUN - GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE */ - { 0x00B01, 0x00B03 }, /* ORIYA SIGN CANDRABINDU - ORIYA SIGN VISARGA */ - { 0x00B3C, 0x00B3C }, /* ORIYA SIGN NUKTA */ - { 0x00B3E, 0x00B44 }, /* ORIYA VOWEL SIGN AA - ORIYA VOWEL SIGN VOCALIC RR */ - { 0x00B47, 0x00B48 }, /* ORIYA VOWEL SIGN E - ORIYA VOWEL SIGN AI */ - { 0x00B4B, 0x00B4D }, /* ORIYA VOWEL SIGN O - ORIYA SIGN VIRAMA */ - { 0x00B55, 0x00B57 }, /* ORIYA SIGN OVERLINE - ORIYA AU LENGTH MARK */ - { 0x00B62, 0x00B63 }, /* ORIYA VOWEL SIGN VOCALIC L - ORIYA VOWEL SIGN VOCALIC LL */ - { 0x00B82, 0x00B82 }, /* TAMIL SIGN ANUSVARA */ - { 0x00BBE, 0x00BC2 }, /* TAMIL VOWEL SIGN AA - TAMIL VOWEL SIGN UU */ - { 0x00BC6, 0x00BC8 }, /* TAMIL VOWEL SIGN E - TAMIL VOWEL SIGN AI */ - { 0x00BCA, 0x00BCD }, /* TAMIL VOWEL SIGN O - TAMIL SIGN VIRAMA */ - { 0x00BD7, 0x00BD7 }, /* TAMIL AU LENGTH MARK */ - { 0x00C00, 0x00C04 }, /* TELUGU SIGN COMBINING CANDRABINDU ABOVE - TELUGU SIGN COMBINING ANUSVARA ABOVE */ - { 0x00C3C, 0x00C3C }, /* TELUGU SIGN NUKTA */ - { 0x00C3E, 0x00C44 }, /* TELUGU VOWEL SIGN AA - TELUGU VOWEL SIGN VOCALIC RR */ - { 0x00C46, 0x00C48 }, /* TELUGU VOWEL SIGN E - TELUGU VOWEL SIGN AI */ - { 0x00C4A, 0x00C4D }, /* TELUGU VOWEL SIGN O - TELUGU SIGN VIRAMA */ - { 0x00C55, 0x00C56 }, /* TELUGU LENGTH MARK - TELUGU AI LENGTH MARK */ - { 0x00C62, 0x00C63 }, /* TELUGU VOWEL SIGN VOCALIC L - TELUGU VOWEL SIGN VOCALIC LL */ - { 0x00C81, 0x00C83 }, /* KANNADA SIGN CANDRABINDU - KANNADA SIGN VISARGA */ - { 0x00CBC, 0x00CBC }, /* KANNADA SIGN NUKTA */ - { 0x00CBE, 0x00CC4 }, /* KANNADA VOWEL SIGN AA - KANNADA VOWEL SIGN VOCALIC RR */ - { 0x00CC6, 0x00CC8 }, /* KANNADA VOWEL SIGN E - KANNADA VOWEL SIGN AI */ - { 0x00CCA, 0x00CCD }, /* KANNADA VOWEL SIGN O - KANNADA SIGN VIRAMA */ - { 0x00CD5, 0x00CD6 }, /* KANNADA LENGTH MARK - KANNADA AI LENGTH MARK */ - { 0x00CE2, 0x00CE3 }, /* KANNADA VOWEL SIGN VOCALIC L - KANNADA VOWEL SIGN VOCALIC LL */ - { 0x00CF3, 0x00CF3 }, /* KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT */ - { 0x00D00, 0x00D03 }, /* MALAYALAM SIGN COMBINING ANUSVARA ABOVE - MALAYALAM SIGN VISARGA */ - { 0x00D3B, 0x00D3C }, /* MALAYALAM SIGN VERTICAL BAR VIRAMA - MALAYALAM SIGN CIRCULAR VIRAMA */ - { 0x00D3E, 0x00D44 }, /* MALAYALAM VOWEL SIGN AA - MALAYALAM VOWEL SIGN VOCALIC RR */ - { 0x00D46, 0x00D48 }, /* MALAYALAM VOWEL SIGN E - MALAYALAM VOWEL SIGN AI */ - { 0x00D4A, 0x00D4D }, /* MALAYALAM VOWEL SIGN O - MALAYALAM SIGN VIRAMA */ - { 0x00D57, 0x00D57 }, /* MALAYALAM AU LENGTH MARK */ - { 0x00D62, 0x00D63 }, /* MALAYALAM VOWEL SIGN VOCALIC L - MALAYALAM VOWEL SIGN VOCALIC LL */ - { 0x00D81, 0x00D83 }, /* SINHALA SIGN CANDRABINDU - SINHALA SIGN VISARGAYA */ - { 0x00DCA, 0x00DCA }, /* SINHALA SIGN AL-LAKUNA */ - { 0x00DCF, 0x00DD4 }, /* SINHALA VOWEL SIGN AELA-PILLA - SINHALA VOWEL SIGN KETTI PAA-PILLA */ - { 0x00DD6, 0x00DD6 }, /* SINHALA VOWEL SIGN DIGA PAA-PILLA */ - { 0x00DD8, 0x00DDF }, /* SINHALA VOWEL SIGN GAETTA-PILLA - SINHALA VOWEL SIGN GAYANUKITTA */ - { 0x00DF2, 0x00DF3 }, /* SINHALA VOWEL SIGN DIGA GAETTA-PILLA - SINHALA VOWEL SIGN DIGA GAYANUKITTA */ - { 0x00E31, 0x00E31 }, /* THAI CHARACTER MAI HAN-AKAT */ - { 0x00E34, 0x00E3A }, /* THAI CHARACTER SARA I - THAI CHARACTER PHINTHU */ - { 0x00E47, 0x00E4E }, /* THAI CHARACTER MAITAIKHU - THAI CHARACTER YAMAKKAN */ - { 0x00EB1, 0x00EB1 }, /* LAO VOWEL SIGN MAI KAN */ - { 0x00EB4, 0x00EBC }, /* LAO VOWEL SIGN I - LAO SEMIVOWEL SIGN LO */ - { 0x00EC8, 0x00ECE }, /* LAO TONE MAI EK - LAO YAMAKKAN */ - { 0x00F18, 0x00F19 }, /* TIBETAN ASTROLOGICAL SIGN -KHYUD PA - TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS */ - { 0x00F35, 0x00F35 }, /* TIBETAN MARK NGAS BZUNG NYI ZLA */ - { 0x00F37, 0x00F37 }, /* TIBETAN MARK NGAS BZUNG SGOR RTAGS */ - { 0x00F39, 0x00F39 }, /* TIBETAN MARK TSA -PHRU */ - { 0x00F3E, 0x00F3F }, /* TIBETAN SIGN YAR TSHES - TIBETAN SIGN MAR TSHES */ - { 0x00F71, 0x00F84 }, /* TIBETAN VOWEL SIGN AA - TIBETAN MARK HALANTA */ - { 0x00F86, 0x00F87 }, /* TIBETAN SIGN LCI RTAGS - TIBETAN SIGN YANG RTAGS */ - { 0x00F8D, 0x00F97 }, /* TIBETAN SUBJOINED SIGN LCE TSA CAN - TIBETAN SUBJOINED LETTER JA */ - { 0x00F99, 0x00FBC }, /* TIBETAN SUBJOINED LETTER NYA - TIBETAN SUBJOINED LETTER FIXED-FORM RA */ - { 0x00FC6, 0x00FC6 }, /* TIBETAN SYMBOL PADMA GDAN */ - { 0x0102B, 0x0103E }, /* MYANMAR VOWEL SIGN TALL AA - MYANMAR CONSONANT SIGN MEDIAL HA */ - { 0x01056, 0x01059 }, /* MYANMAR VOWEL SIGN VOCALIC R - MYANMAR VOWEL SIGN VOCALIC LL */ - { 0x0105E, 0x01060 }, /* MYANMAR CONSONANT SIGN MON MEDIAL NA - MYANMAR CONSONANT SIGN MON MEDIAL LA */ - { 0x01062, 0x01064 }, /* MYANMAR VOWEL SIGN SGAW KAREN EU - MYANMAR TONE MARK SGAW KAREN KE PHO */ - { 0x01067, 0x0106D }, /* MYANMAR VOWEL SIGN WESTERN PWO KAREN EU - MYANMAR SIGN WESTERN PWO KAREN TONE-5 */ - { 0x01071, 0x01074 }, /* MYANMAR VOWEL SIGN GEBA KAREN I - MYANMAR VOWEL SIGN KAYAH EE */ - { 0x01082, 0x0108D }, /* MYANMAR CONSONANT SIGN SHAN MEDIAL WA - MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE */ - { 0x0108F, 0x0108F }, /* MYANMAR SIGN RUMAI PALAUNG TONE-5 */ - { 0x0109A, 0x0109D }, /* MYANMAR SIGN KHAMTI TONE-1 - MYANMAR VOWEL SIGN AITON AI */ - { 0x0135D, 0x0135F }, /* ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK - ETHIOPIC COMBINING GEMINATION MARK */ - { 0x01712, 0x01715 }, /* TAGALOG VOWEL SIGN I - TAGALOG SIGN PAMUDPOD */ - { 0x01732, 0x01734 }, /* HANUNOO VOWEL SIGN I - HANUNOO SIGN PAMUDPOD */ - { 0x01752, 0x01753 }, /* BUHID VOWEL SIGN I - BUHID VOWEL SIGN U */ - { 0x01772, 0x01773 }, /* TAGBANWA VOWEL SIGN I - TAGBANWA VOWEL SIGN U */ - { 0x017B4, 0x017D3 }, /* KHMER VOWEL INHERENT AQ - KHMER SIGN BATHAMASAT */ - { 0x017DD, 0x017DD }, /* KHMER SIGN ATTHACAN */ - { 0x0180B, 0x0180F }, /* MONGOLIAN FREE VARIATION SELECTOR ONE - MONGOLIAN FREE VARIATION SELECTOR FOUR */ - { 0x01885, 0x01886 }, /* MONGOLIAN LETTER ALI GALI BALUDA - MONGOLIAN LETTER ALI GALI THREE BALUDA */ - { 0x018A9, 0x018A9 }, /* MONGOLIAN LETTER ALI GALI DAGALGA */ - { 0x01920, 0x0192B }, /* LIMBU VOWEL SIGN A - LIMBU SUBJOINED LETTER WA */ - { 0x01930, 0x0193B }, /* LIMBU SMALL LETTER KA - LIMBU SIGN SA-I */ - { 0x01A17, 0x01A1B }, /* BUGINESE VOWEL SIGN I - BUGINESE VOWEL SIGN AE */ - { 0x01A55, 0x01A5E }, /* TAI THAM CONSONANT SIGN MEDIAL RA - TAI THAM CONSONANT SIGN SA */ - { 0x01A60, 0x01A7C }, /* TAI THAM SIGN SAKOT - TAI THAM SIGN KHUEN-LUE KARAN */ - { 0x01A7F, 0x01A7F }, /* TAI THAM COMBINING CRYPTOGRAMMIC DOT */ - { 0x01AB0, 0x01ACE }, /* COMBINING DOUBLED CIRCUMFLEX ACCENT - COMBINING LATIN SMALL LETTER INSULAR T */ - { 0x01B00, 0x01B04 }, /* BALINESE SIGN ULU RICEM - BALINESE SIGN BISAH */ - { 0x01B34, 0x01B44 }, /* BALINESE SIGN REREKAN - BALINESE ADEG ADEG */ - { 0x01B6B, 0x01B73 }, /* BALINESE MUSICAL SYMBOL COMBINING TEGEH - BALINESE MUSICAL SYMBOL COMBINING GONG */ - { 0x01B80, 0x01B82 }, /* SUNDANESE SIGN PANYECEK - SUNDANESE SIGN PANGWISAD */ - { 0x01BA1, 0x01BAD }, /* SUNDANESE CONSONANT SIGN PAMINGKAL - SUNDANESE CONSONANT SIGN PASANGAN WA */ - { 0x01BE6, 0x01BF3 }, /* BATAK SIGN TOMPI - BATAK PANONGONAN */ - { 0x01C24, 0x01C37 }, /* LEPCHA SUBJOINED LETTER YA - LEPCHA SIGN NUKTA */ - { 0x01CD0, 0x01CD2 }, /* VEDIC TONE KARSHANA - VEDIC TONE PRENKHA */ - { 0x01CD4, 0x01CE8 }, /* VEDIC SIGN YAJURVEDIC MIDLINE SVARITA - VEDIC SIGN VISARGA ANUDATTA WITH TAIL */ - { 0x01CED, 0x01CED }, /* VEDIC SIGN TIRYAK */ - { 0x01CF4, 0x01CF4 }, /* VEDIC TONE CANDRA ABOVE */ - { 0x01CF7, 0x01CF9 }, /* VEDIC SIGN ATIKRAMA - VEDIC TONE DOUBLE RING ABOVE */ - { 0x01DC0, 0x01DFF }, /* COMBINING DOTTED GRAVE ACCENT - COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW */ - { 0x0200B, 0x0200F }, /* ZERO WIDTH SPACE - RIGHT-TO-LEFT MARK */ - { 0x0202A, 0x0202E }, /* LEFT-TO-RIGHT EMBEDDING - RIGHT-TO-LEFT OVERRIDE */ - { 0x02060, 0x02064 }, /* WORD JOINER - INVISIBLE PLUS */ - { 0x02066, 0x0206F }, /* LEFT-TO-RIGHT ISOLATE - NOMINAL DIGIT SHAPES */ - { 0x020D0, 0x020F0 }, /* COMBINING LEFT HARPOON ABOVE - COMBINING ASTERISK ABOVE */ - { 0x02640, 0x02640 }, /* FEMALE SIGN */ - { 0x02642, 0x02642 }, /* MALE SIGN */ - { 0x026A7, 0x026A7 }, /* MALE WITH STROKE AND MALE AND FEMALE SIGN */ - { 0x02CEF, 0x02CF1 }, /* COPTIC COMBINING NI ABOVE - COPTIC COMBINING SPIRITUS LENIS */ - { 0x02D7F, 0x02D7F }, /* TIFINAGH CONSONANT JOINER */ - { 0x02DE0, 0x02DFF }, /* COMBINING CYRILLIC LETTER BE - COMBINING CYRILLIC LETTER IOTIFIED BIG YUS */ - { 0x0302A, 0x0302F }, /* IDEOGRAPHIC LEVEL TONE MARK - HANGUL DOUBLE DOT TONE MARK */ - { 0x03099, 0x0309A }, /* COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK - COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK */ - { 0x0A66F, 0x0A672 }, /* COMBINING CYRILLIC VZMET - COMBINING CYRILLIC THOUSAND MILLIONS SIGN */ - { 0x0A674, 0x0A67D }, /* COMBINING CYRILLIC LETTER UKRAINIAN IE - COMBINING CYRILLIC PAYEROK */ - { 0x0A69E, 0x0A69F }, /* COMBINING CYRILLIC LETTER EF - COMBINING CYRILLIC LETTER IOTIFIED E */ - { 0x0A6F0, 0x0A6F1 }, /* BAMUM COMBINING MARK KOQNDON - BAMUM COMBINING MARK TUKWENTIS */ - { 0x0A802, 0x0A802 }, /* SYLOTI NAGRI SIGN DVISVARA */ - { 0x0A806, 0x0A806 }, /* SYLOTI NAGRI SIGN HASANTA */ - { 0x0A80B, 0x0A80B }, /* SYLOTI NAGRI SIGN ANUSVARA */ - { 0x0A823, 0x0A827 }, /* SYLOTI NAGRI VOWEL SIGN A - SYLOTI NAGRI VOWEL SIGN OO */ - { 0x0A82C, 0x0A82C }, /* SYLOTI NAGRI SIGN ALTERNATE HASANTA */ - { 0x0A880, 0x0A881 }, /* SAURASHTRA SIGN ANUSVARA - SAURASHTRA SIGN VISARGA */ - { 0x0A8B4, 0x0A8C5 }, /* SAURASHTRA CONSONANT SIGN HAARU - SAURASHTRA SIGN CANDRABINDU */ - { 0x0A8E0, 0x0A8F1 }, /* COMBINING DEVANAGARI DIGIT ZERO - COMBINING DEVANAGARI SIGN AVAGRAHA */ - { 0x0A8FF, 0x0A8FF }, /* DEVANAGARI VOWEL SIGN AY */ - { 0x0A926, 0x0A92D }, /* KAYAH LI VOWEL UE - KAYAH LI TONE CALYA PLOPHU */ - { 0x0A947, 0x0A953 }, /* REJANG VOWEL SIGN I - REJANG VIRAMA */ - { 0x0A980, 0x0A983 }, /* JAVANESE SIGN PANYANGGA - JAVANESE SIGN WIGNYAN */ - { 0x0A9B3, 0x0A9C0 }, /* JAVANESE SIGN CECAK TELU - JAVANESE PANGKON */ - { 0x0A9E5, 0x0A9E5 }, /* MYANMAR SIGN SHAN SAW */ - { 0x0AA29, 0x0AA36 }, /* CHAM VOWEL SIGN AA - CHAM CONSONANT SIGN WA */ - { 0x0AA43, 0x0AA43 }, /* CHAM CONSONANT SIGN FINAL NG */ - { 0x0AA4C, 0x0AA4D }, /* CHAM CONSONANT SIGN FINAL M - CHAM CONSONANT SIGN FINAL H */ - { 0x0AA7B, 0x0AA7D }, /* MYANMAR SIGN PAO KAREN TONE - MYANMAR SIGN TAI LAING TONE-5 */ - { 0x0AAB0, 0x0AAB0 }, /* TAI VIET MAI KANG */ - { 0x0AAB2, 0x0AAB4 }, /* TAI VIET VOWEL I - TAI VIET VOWEL U */ - { 0x0AAB7, 0x0AAB8 }, /* TAI VIET MAI KHIT - TAI VIET VOWEL IA */ - { 0x0AABE, 0x0AABF }, /* TAI VIET VOWEL AM - TAI VIET TONE MAI EK */ - { 0x0AAC1, 0x0AAC1 }, /* TAI VIET TONE MAI THO */ - { 0x0AAEB, 0x0AAEF }, /* MEETEI MAYEK VOWEL SIGN II - MEETEI MAYEK VOWEL SIGN AAU */ - { 0x0AAF5, 0x0AAF6 }, /* MEETEI MAYEK VOWEL SIGN VISARGA - MEETEI MAYEK VIRAMA */ - { 0x0ABE3, 0x0ABEA }, /* MEETEI MAYEK VOWEL SIGN ONAP - MEETEI MAYEK VOWEL SIGN NUNG */ - { 0x0ABEC, 0x0ABED }, /* MEETEI MAYEK LUM IYEK - MEETEI MAYEK APUN IYEK */ - { 0x0FB1E, 0x0FB1E }, /* HEBREW POINT JUDEO-SPANISH VARIKA */ - { 0x0FE00, 0x0FE0F }, /* VARIATION SELECTOR-1 - VARIATION SELECTOR-16 */ - { 0x0FE20, 0x0FE2F }, /* COMBINING LIGATURE LEFT HALF - COMBINING CYRILLIC TITLO RIGHT HALF */ - { 0x0FEFF, 0x0FEFF }, /* ZERO WIDTH NO-BREAK SPACE */ - { 0x0FFF9, 0x0FFFB }, /* INTERLINEAR ANNOTATION ANCHOR - INTERLINEAR ANNOTATION TERMINATOR */ +/* Zero-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct ucs_interval16 ucs_zero_width_bmp_ranges[] = { + { 0x00AD, 0x00AD }, /* SOFT HYPHEN */ + { 0x0300, 0x036F }, /* COMBINING GRAVE ACCENT - COMBINING LATIN SMALL LETTER X */ + { 0x0483, 0x0489 }, /* COMBINING CYRILLIC TITLO - COMBINING CYRILLIC MILLIONS SIGN */ + { 0x0591, 0x05BD }, /* HEBREW ACCENT ETNAHTA - HEBREW POINT METEG */ + { 0x05BF, 0x05BF }, /* HEBREW POINT RAFE */ + { 0x05C1, 0x05C2 }, /* HEBREW POINT SHIN DOT - HEBREW POINT SIN DOT */ + { 0x05C4, 0x05C5 }, /* HEBREW MARK UPPER DOT - HEBREW MARK LOWER DOT */ + { 0x05C7, 0x05C7 }, /* HEBREW POINT QAMATS QATAN */ + { 0x0600, 0x0605 }, /* ARABIC NUMBER SIGN - ARABIC NUMBER MARK ABOVE */ + { 0x0610, 0x061A }, /* ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM - ARABIC SMALL KASRA */ + { 0x061C, 0x061C }, /* ARABIC LETTER MARK */ + { 0x064B, 0x065F }, /* ARABIC FATHATAN - ARABIC WAVY HAMZA BELOW */ + { 0x0670, 0x0670 }, /* ARABIC LETTER SUPERSCRIPT ALEF */ + { 0x06D6, 0x06DD }, /* ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA - ARABIC END OF AYAH */ + { 0x06DF, 0x06E4 }, /* ARABIC SMALL HIGH ROUNDED ZERO - ARABIC SMALL HIGH MADDA */ + { 0x06E7, 0x06E8 }, /* ARABIC SMALL HIGH YEH - ARABIC SMALL HIGH NOON */ + { 0x06EA, 0x06ED }, /* ARABIC EMPTY CENTRE LOW STOP - ARABIC SMALL LOW MEEM */ + { 0x070F, 0x070F }, /* SYRIAC ABBREVIATION MARK */ + { 0x0711, 0x0711 }, /* SYRIAC LETTER SUPERSCRIPT ALAPH */ + { 0x0730, 0x074A }, /* SYRIAC PTHAHA ABOVE - SYRIAC BARREKH */ + { 0x07A6, 0x07B0 }, /* THAANA ABAFILI - THAANA SUKUN */ + { 0x07EB, 0x07F3 }, /* NKO COMBINING SHORT HIGH TONE - NKO COMBINING DOUBLE DOT ABOVE */ + { 0x07FD, 0x07FD }, /* NKO DANTAYALAN */ + { 0x0816, 0x0819 }, /* SAMARITAN MARK IN - SAMARITAN MARK DAGESH */ + { 0x081B, 0x0823 }, /* SAMARITAN MARK EPENTHETIC YUT - SAMARITAN VOWEL SIGN A */ + { 0x0825, 0x0827 }, /* SAMARITAN VOWEL SIGN SHORT A - SAMARITAN VOWEL SIGN U */ + { 0x0829, 0x082D }, /* SAMARITAN VOWEL SIGN LONG I - SAMARITAN MARK NEQUDAA */ + { 0x0859, 0x085B }, /* MANDAIC AFFRICATION MARK - MANDAIC GEMINATION MARK */ + { 0x0890, 0x0891 }, /* ARABIC POUND MARK ABOVE - ARABIC PIASTRE MARK ABOVE */ + { 0x0897, 0x089F }, /* ARABIC PEPET - ARABIC HALF MADDA OVER MADDA */ + { 0x08CA, 0x0903 }, /* ARABIC SMALL HIGH FARSI YEH - DEVANAGARI SIGN VISARGA */ + { 0x093A, 0x093C }, /* DEVANAGARI VOWEL SIGN OE - DEVANAGARI SIGN NUKTA */ + { 0x093E, 0x094F }, /* DEVANAGARI VOWEL SIGN AA - DEVANAGARI VOWEL SIGN AW */ + { 0x0951, 0x0957 }, /* DEVANAGARI STRESS SIGN UDATTA - DEVANAGARI VOWEL SIGN UUE */ + { 0x0962, 0x0963 }, /* DEVANAGARI VOWEL SIGN VOCALIC L - DEVANAGARI VOWEL SIGN VOCALIC LL */ + { 0x0981, 0x0983 }, /* BENGALI SIGN CANDRABINDU - BENGALI SIGN VISARGA */ + { 0x09BC, 0x09BC }, /* BENGALI SIGN NUKTA */ + { 0x09BE, 0x09C4 }, /* BENGALI VOWEL SIGN AA - BENGALI VOWEL SIGN VOCALIC RR */ + { 0x09C7, 0x09C8 }, /* BENGALI VOWEL SIGN E - BENGALI VOWEL SIGN AI */ + { 0x09CB, 0x09CD }, /* BENGALI VOWEL SIGN O - BENGALI SIGN VIRAMA */ + { 0x09D7, 0x09D7 }, /* BENGALI AU LENGTH MARK */ + { 0x09E2, 0x09E3 }, /* BENGALI VOWEL SIGN VOCALIC L - BENGALI VOWEL SIGN VOCALIC LL */ + { 0x09FE, 0x09FE }, /* BENGALI SANDHI MARK */ + { 0x0A01, 0x0A03 }, /* GURMUKHI SIGN ADAK BINDI - GURMUKHI SIGN VISARGA */ + { 0x0A3C, 0x0A3C }, /* GURMUKHI SIGN NUKTA */ + { 0x0A3E, 0x0A42 }, /* GURMUKHI VOWEL SIGN AA - GURMUKHI VOWEL SIGN UU */ + { 0x0A47, 0x0A48 }, /* GURMUKHI VOWEL SIGN EE - GURMUKHI VOWEL SIGN AI */ + { 0x0A4B, 0x0A4D }, /* GURMUKHI VOWEL SIGN OO - GURMUKHI SIGN VIRAMA */ + { 0x0A51, 0x0A51 }, /* GURMUKHI SIGN UDAAT */ + { 0x0A70, 0x0A71 }, /* GURMUKHI TIPPI - GURMUKHI ADDAK */ + { 0x0A75, 0x0A75 }, /* GURMUKHI SIGN YAKASH */ + { 0x0A81, 0x0A83 }, /* GUJARATI SIGN CANDRABINDU - GUJARATI SIGN VISARGA */ + { 0x0ABC, 0x0ABC }, /* GUJARATI SIGN NUKTA */ + { 0x0ABE, 0x0AC5 }, /* GUJARATI VOWEL SIGN AA - GUJARATI VOWEL SIGN CANDRA E */ + { 0x0AC7, 0x0AC9 }, /* GUJARATI VOWEL SIGN E - GUJARATI VOWEL SIGN CANDRA O */ + { 0x0ACB, 0x0ACD }, /* GUJARATI VOWEL SIGN O - GUJARATI SIGN VIRAMA */ + { 0x0AE2, 0x0AE3 }, /* GUJARATI VOWEL SIGN VOCALIC L - GUJARATI VOWEL SIGN VOCALIC LL */ + { 0x0AFA, 0x0AFF }, /* GUJARATI SIGN SUKUN - GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE */ + { 0x0B01, 0x0B03 }, /* ORIYA SIGN CANDRABINDU - ORIYA SIGN VISARGA */ + { 0x0B3C, 0x0B3C }, /* ORIYA SIGN NUKTA */ + { 0x0B3E, 0x0B44 }, /* ORIYA VOWEL SIGN AA - ORIYA VOWEL SIGN VOCALIC RR */ + { 0x0B47, 0x0B48 }, /* ORIYA VOWEL SIGN E - ORIYA VOWEL SIGN AI */ + { 0x0B4B, 0x0B4D }, /* ORIYA VOWEL SIGN O - ORIYA SIGN VIRAMA */ + { 0x0B55, 0x0B57 }, /* ORIYA SIGN OVERLINE - ORIYA AU LENGTH MARK */ + { 0x0B62, 0x0B63 }, /* ORIYA VOWEL SIGN VOCALIC L - ORIYA VOWEL SIGN VOCALIC LL */ + { 0x0B82, 0x0B82 }, /* TAMIL SIGN ANUSVARA */ + { 0x0BBE, 0x0BC2 }, /* TAMIL VOWEL SIGN AA - TAMIL VOWEL SIGN UU */ + { 0x0BC6, 0x0BC8 }, /* TAMIL VOWEL SIGN E - TAMIL VOWEL SIGN AI */ + { 0x0BCA, 0x0BCD }, /* TAMIL VOWEL SIGN O - TAMIL SIGN VIRAMA */ + { 0x0BD7, 0x0BD7 }, /* TAMIL AU LENGTH MARK */ + { 0x0C00, 0x0C04 }, /* TELUGU SIGN COMBINING CANDRABINDU ABOVE - TELUGU SIGN COMBINING ANUSVARA ABOVE */ + { 0x0C3C, 0x0C3C }, /* TELUGU SIGN NUKTA */ + { 0x0C3E, 0x0C44 }, /* TELUGU VOWEL SIGN AA - TELUGU VOWEL SIGN VOCALIC RR */ + { 0x0C46, 0x0C48 }, /* TELUGU VOWEL SIGN E - TELUGU VOWEL SIGN AI */ + { 0x0C4A, 0x0C4D }, /* TELUGU VOWEL SIGN O - TELUGU SIGN VIRAMA */ + { 0x0C55, 0x0C56 }, /* TELUGU LENGTH MARK - TELUGU AI LENGTH MARK */ + { 0x0C62, 0x0C63 }, /* TELUGU VOWEL SIGN VOCALIC L - TELUGU VOWEL SIGN VOCALIC LL */ + { 0x0C81, 0x0C83 }, /* KANNADA SIGN CANDRABINDU - KANNADA SIGN VISARGA */ + { 0x0CBC, 0x0CBC }, /* KANNADA SIGN NUKTA */ + { 0x0CBE, 0x0CC4 }, /* KANNADA VOWEL SIGN AA - KANNADA VOWEL SIGN VOCALIC RR */ + { 0x0CC6, 0x0CC8 }, /* KANNADA VOWEL SIGN E - KANNADA VOWEL SIGN AI */ + { 0x0CCA, 0x0CCD }, /* KANNADA VOWEL SIGN O - KANNADA SIGN VIRAMA */ + { 0x0CD5, 0x0CD6 }, /* KANNADA LENGTH MARK - KANNADA AI LENGTH MARK */ + { 0x0CE2, 0x0CE3 }, /* KANNADA VOWEL SIGN VOCALIC L - KANNADA VOWEL SIGN VOCALIC LL */ + { 0x0CF3, 0x0CF3 }, /* KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT */ + { 0x0D00, 0x0D03 }, /* MALAYALAM SIGN COMBINING ANUSVARA ABOVE - MALAYALAM SIGN VISARGA */ + { 0x0D3B, 0x0D3C }, /* MALAYALAM SIGN VERTICAL BAR VIRAMA - MALAYALAM SIGN CIRCULAR VIRAMA */ + { 0x0D3E, 0x0D44 }, /* MALAYALAM VOWEL SIGN AA - MALAYALAM VOWEL SIGN VOCALIC RR */ + { 0x0D46, 0x0D48 }, /* MALAYALAM VOWEL SIGN E - MALAYALAM VOWEL SIGN AI */ + { 0x0D4A, 0x0D4D }, /* MALAYALAM VOWEL SIGN O - MALAYALAM SIGN VIRAMA */ + { 0x0D57, 0x0D57 }, /* MALAYALAM AU LENGTH MARK */ + { 0x0D62, 0x0D63 }, /* MALAYALAM VOWEL SIGN VOCALIC L - MALAYALAM VOWEL SIGN VOCALIC LL */ + { 0x0D81, 0x0D83 }, /* SINHALA SIGN CANDRABINDU - SINHALA SIGN VISARGAYA */ + { 0x0DCA, 0x0DCA }, /* SINHALA SIGN AL-LAKUNA */ + { 0x0DCF, 0x0DD4 }, /* SINHALA VOWEL SIGN AELA-PILLA - SINHALA VOWEL SIGN KETTI PAA-PILLA */ + { 0x0DD6, 0x0DD6 }, /* SINHALA VOWEL SIGN DIGA PAA-PILLA */ + { 0x0DD8, 0x0DDF }, /* SINHALA VOWEL SIGN GAETTA-PILLA - SINHALA VOWEL SIGN GAYANUKITTA */ + { 0x0DF2, 0x0DF3 }, /* SINHALA VOWEL SIGN DIGA GAETTA-PILLA - SINHALA VOWEL SIGN DIGA GAYANUKITTA */ + { 0x0E31, 0x0E31 }, /* THAI CHARACTER MAI HAN-AKAT */ + { 0x0E34, 0x0E3A }, /* THAI CHARACTER SARA I - THAI CHARACTER PHINTHU */ + { 0x0E47, 0x0E4E }, /* THAI CHARACTER MAITAIKHU - THAI CHARACTER YAMAKKAN */ + { 0x0EB1, 0x0EB1 }, /* LAO VOWEL SIGN MAI KAN */ + { 0x0EB4, 0x0EBC }, /* LAO VOWEL SIGN I - LAO SEMIVOWEL SIGN LO */ + { 0x0EC8, 0x0ECE }, /* LAO TONE MAI EK - LAO YAMAKKAN */ + { 0x0F18, 0x0F19 }, /* TIBETAN ASTROLOGICAL SIGN -KHYUD PA - TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS */ + { 0x0F35, 0x0F35 }, /* TIBETAN MARK NGAS BZUNG NYI ZLA */ + { 0x0F37, 0x0F37 }, /* TIBETAN MARK NGAS BZUNG SGOR RTAGS */ + { 0x0F39, 0x0F39 }, /* TIBETAN MARK TSA -PHRU */ + { 0x0F3E, 0x0F3F }, /* TIBETAN SIGN YAR TSHES - TIBETAN SIGN MAR TSHES */ + { 0x0F71, 0x0F84 }, /* TIBETAN VOWEL SIGN AA - TIBETAN MARK HALANTA */ + { 0x0F86, 0x0F87 }, /* TIBETAN SIGN LCI RTAGS - TIBETAN SIGN YANG RTAGS */ + { 0x0F8D, 0x0F97 }, /* TIBETAN SUBJOINED SIGN LCE TSA CAN - TIBETAN SUBJOINED LETTER JA */ + { 0x0F99, 0x0FBC }, /* TIBETAN SUBJOINED LETTER NYA - TIBETAN SUBJOINED LETTER FIXED-FORM RA */ + { 0x0FC6, 0x0FC6 }, /* TIBETAN SYMBOL PADMA GDAN */ + { 0x102B, 0x103E }, /* MYANMAR VOWEL SIGN TALL AA - MYANMAR CONSONANT SIGN MEDIAL HA */ + { 0x1056, 0x1059 }, /* MYANMAR VOWEL SIGN VOCALIC R - MYANMAR VOWEL SIGN VOCALIC LL */ + { 0x105E, 0x1060 }, /* MYANMAR CONSONANT SIGN MON MEDIAL NA - MYANMAR CONSONANT SIGN MON MEDIAL LA */ + { 0x1062, 0x1064 }, /* MYANMAR VOWEL SIGN SGAW KAREN EU - MYANMAR TONE MARK SGAW KAREN KE PHO */ + { 0x1067, 0x106D }, /* MYANMAR VOWEL SIGN WESTERN PWO KAREN EU - MYANMAR SIGN WESTERN PWO KAREN TONE-5 */ + { 0x1071, 0x1074 }, /* MYANMAR VOWEL SIGN GEBA KAREN I - MYANMAR VOWEL SIGN KAYAH EE */ + { 0x1082, 0x108D }, /* MYANMAR CONSONANT SIGN SHAN MEDIAL WA - MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE */ + { 0x108F, 0x108F }, /* MYANMAR SIGN RUMAI PALAUNG TONE-5 */ + { 0x109A, 0x109D }, /* MYANMAR SIGN KHAMTI TONE-1 - MYANMAR VOWEL SIGN AITON AI */ + { 0x135D, 0x135F }, /* ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK - ETHIOPIC COMBINING GEMINATION MARK */ + { 0x1712, 0x1715 }, /* TAGALOG VOWEL SIGN I - TAGALOG SIGN PAMUDPOD */ + { 0x1732, 0x1734 }, /* HANUNOO VOWEL SIGN I - HANUNOO SIGN PAMUDPOD */ + { 0x1752, 0x1753 }, /* BUHID VOWEL SIGN I - BUHID VOWEL SIGN U */ + { 0x1772, 0x1773 }, /* TAGBANWA VOWEL SIGN I - TAGBANWA VOWEL SIGN U */ + { 0x17B4, 0x17D3 }, /* KHMER VOWEL INHERENT AQ - KHMER SIGN BATHAMASAT */ + { 0x17DD, 0x17DD }, /* KHMER SIGN ATTHACAN */ + { 0x180B, 0x180F }, /* MONGOLIAN FREE VARIATION SELECTOR ONE - MONGOLIAN FREE VARIATION SELECTOR FOUR */ + { 0x1885, 0x1886 }, /* MONGOLIAN LETTER ALI GALI BALUDA - MONGOLIAN LETTER ALI GALI THREE BALUDA */ + { 0x18A9, 0x18A9 }, /* MONGOLIAN LETTER ALI GALI DAGALGA */ + { 0x1920, 0x192B }, /* LIMBU VOWEL SIGN A - LIMBU SUBJOINED LETTER WA */ + { 0x1930, 0x193B }, /* LIMBU SMALL LETTER KA - LIMBU SIGN SA-I */ + { 0x1A17, 0x1A1B }, /* BUGINESE VOWEL SIGN I - BUGINESE VOWEL SIGN AE */ + { 0x1A55, 0x1A5E }, /* TAI THAM CONSONANT SIGN MEDIAL RA - TAI THAM CONSONANT SIGN SA */ + { 0x1A60, 0x1A7C }, /* TAI THAM SIGN SAKOT - TAI THAM SIGN KHUEN-LUE KARAN */ + { 0x1A7F, 0x1A7F }, /* TAI THAM COMBINING CRYPTOGRAMMIC DOT */ + { 0x1AB0, 0x1ACE }, /* COMBINING DOUBLED CIRCUMFLEX ACCENT - COMBINING LATIN SMALL LETTER INSULAR T */ + { 0x1B00, 0x1B04 }, /* BALINESE SIGN ULU RICEM - BALINESE SIGN BISAH */ + { 0x1B34, 0x1B44 }, /* BALINESE SIGN REREKAN - BALINESE ADEG ADEG */ + { 0x1B6B, 0x1B73 }, /* BALINESE MUSICAL SYMBOL COMBINING TEGEH - BALINESE MUSICAL SYMBOL COMBINING GONG */ + { 0x1B80, 0x1B82 }, /* SUNDANESE SIGN PANYECEK - SUNDANESE SIGN PANGWISAD */ + { 0x1BA1, 0x1BAD }, /* SUNDANESE CONSONANT SIGN PAMINGKAL - SUNDANESE CONSONANT SIGN PASANGAN WA */ + { 0x1BE6, 0x1BF3 }, /* BATAK SIGN TOMPI - BATAK PANONGONAN */ + { 0x1C24, 0x1C37 }, /* LEPCHA SUBJOINED LETTER YA - LEPCHA SIGN NUKTA */ + { 0x1CD0, 0x1CD2 }, /* VEDIC TONE KARSHANA - VEDIC TONE PRENKHA */ + { 0x1CD4, 0x1CE8 }, /* VEDIC SIGN YAJURVEDIC MIDLINE SVARITA - VEDIC SIGN VISARGA ANUDATTA WITH TAIL */ + { 0x1CED, 0x1CED }, /* VEDIC SIGN TIRYAK */ + { 0x1CF4, 0x1CF4 }, /* VEDIC TONE CANDRA ABOVE */ + { 0x1CF7, 0x1CF9 }, /* VEDIC SIGN ATIKRAMA - VEDIC TONE DOUBLE RING ABOVE */ + { 0x1DC0, 0x1DFF }, /* COMBINING DOTTED GRAVE ACCENT - COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW */ + { 0x200B, 0x200F }, /* ZERO WIDTH SPACE - RIGHT-TO-LEFT MARK */ + { 0x202A, 0x202E }, /* LEFT-TO-RIGHT EMBEDDING - RIGHT-TO-LEFT OVERRIDE */ + { 0x2060, 0x2064 }, /* WORD JOINER - INVISIBLE PLUS */ + { 0x2066, 0x206F }, /* LEFT-TO-RIGHT ISOLATE - NOMINAL DIGIT SHAPES */ + { 0x20D0, 0x20F0 }, /* COMBINING LEFT HARPOON ABOVE - COMBINING ASTERISK ABOVE */ + { 0x2640, 0x2640 }, /* FEMALE SIGN */ + { 0x2642, 0x2642 }, /* MALE SIGN */ + { 0x26A7, 0x26A7 }, /* MALE WITH STROKE AND MALE AND FEMALE SIGN */ + { 0x2CEF, 0x2CF1 }, /* COPTIC COMBINING NI ABOVE - COPTIC COMBINING SPIRITUS LENIS */ + { 0x2D7F, 0x2D7F }, /* TIFINAGH CONSONANT JOINER */ + { 0x2DE0, 0x2DFF }, /* COMBINING CYRILLIC LETTER BE - COMBINING CYRILLIC LETTER IOTIFIED BIG YUS */ + { 0x302A, 0x302F }, /* IDEOGRAPHIC LEVEL TONE MARK - HANGUL DOUBLE DOT TONE MARK */ + { 0x3099, 0x309A }, /* COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK - COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK */ + { 0xA66F, 0xA672 }, /* COMBINING CYRILLIC VZMET - COMBINING CYRILLIC THOUSAND MILLIONS SIGN */ + { 0xA674, 0xA67D }, /* COMBINING CYRILLIC LETTER UKRAINIAN IE - COMBINING CYRILLIC PAYEROK */ + { 0xA69E, 0xA69F }, /* COMBINING CYRILLIC LETTER EF - COMBINING CYRILLIC LETTER IOTIFIED E */ + { 0xA6F0, 0xA6F1 }, /* BAMUM COMBINING MARK KOQNDON - BAMUM COMBINING MARK TUKWENTIS */ + { 0xA802, 0xA802 }, /* SYLOTI NAGRI SIGN DVISVARA */ + { 0xA806, 0xA806 }, /* SYLOTI NAGRI SIGN HASANTA */ + { 0xA80B, 0xA80B }, /* SYLOTI NAGRI SIGN ANUSVARA */ + { 0xA823, 0xA827 }, /* SYLOTI NAGRI VOWEL SIGN A - SYLOTI NAGRI VOWEL SIGN OO */ + { 0xA82C, 0xA82C }, /* SYLOTI NAGRI SIGN ALTERNATE HASANTA */ + { 0xA880, 0xA881 }, /* SAURASHTRA SIGN ANUSVARA - SAURASHTRA SIGN VISARGA */ + { 0xA8B4, 0xA8C5 }, /* SAURASHTRA CONSONANT SIGN HAARU - SAURASHTRA SIGN CANDRABINDU */ + { 0xA8E0, 0xA8F1 }, /* COMBINING DEVANAGARI DIGIT ZERO - COMBINING DEVANAGARI SIGN AVAGRAHA */ + { 0xA8FF, 0xA8FF }, /* DEVANAGARI VOWEL SIGN AY */ + { 0xA926, 0xA92D }, /* KAYAH LI VOWEL UE - KAYAH LI TONE CALYA PLOPHU */ + { 0xA947, 0xA953 }, /* REJANG VOWEL SIGN I - REJANG VIRAMA */ + { 0xA980, 0xA983 }, /* JAVANESE SIGN PANYANGGA - JAVANESE SIGN WIGNYAN */ + { 0xA9B3, 0xA9C0 }, /* JAVANESE SIGN CECAK TELU - JAVANESE PANGKON */ + { 0xA9E5, 0xA9E5 }, /* MYANMAR SIGN SHAN SAW */ + { 0xAA29, 0xAA36 }, /* CHAM VOWEL SIGN AA - CHAM CONSONANT SIGN WA */ + { 0xAA43, 0xAA43 }, /* CHAM CONSONANT SIGN FINAL NG */ + { 0xAA4C, 0xAA4D }, /* CHAM CONSONANT SIGN FINAL M - CHAM CONSONANT SIGN FINAL H */ + { 0xAA7B, 0xAA7D }, /* MYANMAR SIGN PAO KAREN TONE - MYANMAR SIGN TAI LAING TONE-5 */ + { 0xAAB0, 0xAAB0 }, /* TAI VIET MAI KANG */ + { 0xAAB2, 0xAAB4 }, /* TAI VIET VOWEL I - TAI VIET VOWEL U */ + { 0xAAB7, 0xAAB8 }, /* TAI VIET MAI KHIT - TAI VIET VOWEL IA */ + { 0xAABE, 0xAABF }, /* TAI VIET VOWEL AM - TAI VIET TONE MAI EK */ + { 0xAAC1, 0xAAC1 }, /* TAI VIET TONE MAI THO */ + { 0xAAEB, 0xAAEF }, /* MEETEI MAYEK VOWEL SIGN II - MEETEI MAYEK VOWEL SIGN AAU */ + { 0xAAF5, 0xAAF6 }, /* MEETEI MAYEK VOWEL SIGN VISARGA - MEETEI MAYEK VIRAMA */ + { 0xABE3, 0xABEA }, /* MEETEI MAYEK VOWEL SIGN ONAP - MEETEI MAYEK VOWEL SIGN NUNG */ + { 0xABEC, 0xABED }, /* MEETEI MAYEK LUM IYEK - MEETEI MAYEK APUN IYEK */ + { 0xFB1E, 0xFB1E }, /* HEBREW POINT JUDEO-SPANISH VARIKA */ + { 0xFE00, 0xFE0F }, /* VARIATION SELECTOR-1 - VARIATION SELECTOR-16 */ + { 0xFE20, 0xFE2F }, /* COMBINING LIGATURE LEFT HALF - COMBINING CYRILLIC TITLO RIGHT HALF */ + { 0xFEFF, 0xFEFF }, /* ZERO WIDTH NO-BREAK SPACE */ + { 0xFFF9, 0xFFFB }, /* INTERLINEAR ANNOTATION ANCHOR - INTERLINEAR ANNOTATION TERMINATOR */ +}; + +/* Zero-width character ranges (non-BMP, U+10000 and above) */ +static const struct ucs_interval32 ucs_zero_width_non_bmp_ranges[] = { { 0x101FD, 0x101FD }, /* PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE */ { 0x102E0, 0x102E0 }, /* COPTIC EPACT THOUSANDS MARK */ { 0x10376, 0x1037A }, /* COMBINING OLD PERMIC LETTER AN - COMBINING OLD PERMIC LETTER SII */ @@ -350,68 +354,72 @@ static const struct ucs_interval ucs_zero_width_ranges[] = { { 0xE0100, 0xE01EF }, /* VARIATION SELECTOR-17 - VARIATION SELECTOR-256 */ }; -/* Double-width character ranges */ -static const struct ucs_interval ucs_double_width_ranges[] = { - { 0x01100, 0x0115F }, /* HANGUL CHOSEONG KIYEOK - HANGUL CHOSEONG FILLER */ - { 0x0231A, 0x0231B }, /* WATCH - HOURGLASS */ - { 0x02329, 0x0232A }, /* LEFT-POINTING ANGLE BRACKET - RIGHT-POINTING ANGLE BRACKET */ - { 0x023E9, 0x023EC }, /* BLACK RIGHT-POINTING DOUBLE TRIANGLE - BLACK DOWN-POINTING DOUBLE TRIANGLE */ - { 0x023F0, 0x023F0 }, /* ALARM CLOCK */ - { 0x023F3, 0x023F3 }, /* HOURGLASS WITH FLOWING SAND */ - { 0x025FD, 0x025FE }, /* WHITE MEDIUM SMALL SQUARE - BLACK MEDIUM SMALL SQUARE */ - { 0x02614, 0x02615 }, /* UMBRELLA WITH RAIN DROPS - HOT BEVERAGE */ - { 0x02630, 0x02637 }, /* TRIGRAM FOR HEAVEN - TRIGRAM FOR EARTH */ - { 0x02648, 0x02653 }, /* ARIES - PISCES */ - { 0x0267F, 0x0267F }, /* WHEELCHAIR SYMBOL */ - { 0x0268A, 0x0268F }, /* MONOGRAM FOR YANG - DIGRAM FOR GREATER YIN */ - { 0x02693, 0x02693 }, /* ANCHOR */ - { 0x026A1, 0x026A1 }, /* HIGH VOLTAGE SIGN */ - { 0x026AA, 0x026AB }, /* MEDIUM WHITE CIRCLE - MEDIUM BLACK CIRCLE */ - { 0x026BD, 0x026BE }, /* SOCCER BALL - BASEBALL */ - { 0x026C4, 0x026C5 }, /* SNOWMAN WITHOUT SNOW - SUN BEHIND CLOUD */ - { 0x026CE, 0x026CE }, /* OPHIUCHUS */ - { 0x026D4, 0x026D4 }, /* NO ENTRY */ - { 0x026EA, 0x026EA }, /* CHURCH */ - { 0x026F2, 0x026F3 }, /* FOUNTAIN - FLAG IN HOLE */ - { 0x026F5, 0x026F5 }, /* SAILBOAT */ - { 0x026FA, 0x026FA }, /* TENT */ - { 0x026FD, 0x026FD }, /* FUEL PUMP */ - { 0x02705, 0x02705 }, /* WHITE HEAVY CHECK MARK */ - { 0x0270A, 0x0270B }, /* RAISED FIST - RAISED HAND */ - { 0x02728, 0x02728 }, /* SPARKLES */ - { 0x0274C, 0x0274C }, /* CROSS MARK */ - { 0x0274E, 0x0274E }, /* NEGATIVE SQUARED CROSS MARK */ - { 0x02753, 0x02755 }, /* BLACK QUESTION MARK ORNAMENT - WHITE EXCLAMATION MARK ORNAMENT */ - { 0x02757, 0x02757 }, /* HEAVY EXCLAMATION MARK SYMBOL */ - { 0x02795, 0x02797 }, /* HEAVY PLUS SIGN - HEAVY DIVISION SIGN */ - { 0x027B0, 0x027B0 }, /* CURLY LOOP */ - { 0x027BF, 0x027BF }, /* DOUBLE CURLY LOOP */ - { 0x02B1B, 0x02B1C }, /* BLACK LARGE SQUARE - WHITE LARGE SQUARE */ - { 0x02B50, 0x02B50 }, /* WHITE MEDIUM STAR */ - { 0x02B55, 0x02B55 }, /* HEAVY LARGE CIRCLE */ - { 0x02E80, 0x02E99 }, /* CJK RADICAL REPEAT - CJK RADICAL RAP */ - { 0x02E9B, 0x02EF3 }, /* CJK RADICAL CHOKE - CJK RADICAL C-SIMPLIFIED TURTLE */ - { 0x02F00, 0x02FD5 }, /* KANGXI RADICAL ONE - KANGXI RADICAL FLUTE */ - { 0x02FF0, 0x03029 }, /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT - HANGZHOU NUMERAL NINE */ - { 0x03030, 0x0303E }, /* WAVY DASH - IDEOGRAPHIC VARIATION INDICATOR */ - { 0x03041, 0x03096 }, /* HIRAGANA LETTER SMALL A - HIRAGANA LETTER SMALL KE */ - { 0x0309B, 0x030FF }, /* KATAKANA-HIRAGANA VOICED SOUND MARK - KATAKANA DIGRAPH KOTO */ - { 0x03105, 0x0312F }, /* BOPOMOFO LETTER B - BOPOMOFO LETTER NN */ - { 0x03131, 0x0318E }, /* HANGUL LETTER KIYEOK - HANGUL LETTER ARAEAE */ - { 0x03190, 0x031E5 }, /* IDEOGRAPHIC ANNOTATION LINKING MARK - CJK STROKE SZP */ - { 0x031EF, 0x0321E }, /* IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION - PARENTHESIZED KOREAN CHARACTER O HU */ - { 0x03220, 0x03247 }, /* PARENTHESIZED IDEOGRAPH ONE - CIRCLED IDEOGRAPH KOTO */ - { 0x03250, 0x0A48C }, /* PARTNERSHIP SIGN - YI SYLLABLE YYR */ - { 0x0A490, 0x0A4C6 }, /* YI RADICAL QOT - YI RADICAL KE */ - { 0x0A960, 0x0A97C }, /* HANGUL CHOSEONG TIKEUT-MIEUM - HANGUL CHOSEONG SSANGYEORINHIEUH */ - { 0x0AC00, 0x0D7A3 }, /* HANGUL SYLLABLE GA - HANGUL SYLLABLE HIH */ - { 0x0F900, 0x0FAFF }, /* U+F900 - U+FAFF */ - { 0x0FE10, 0x0FE19 }, /* PRESENTATION FORM FOR VERTICAL COMMA - PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS */ - { 0x0FE30, 0x0FE52 }, /* PRESENTATION FORM FOR VERTICAL TWO DOT LEADER - SMALL FULL STOP */ - { 0x0FE54, 0x0FE66 }, /* SMALL SEMICOLON - SMALL EQUALS SIGN */ - { 0x0FE68, 0x0FE6B }, /* SMALL REVERSE SOLIDUS - SMALL COMMERCIAL AT */ - { 0x0FF01, 0x0FF60 }, /* FULLWIDTH EXCLAMATION MARK - FULLWIDTH RIGHT WHITE PARENTHESIS */ - { 0x0FFE0, 0x0FFE6 }, /* FULLWIDTH CENT SIGN - FULLWIDTH WON SIGN */ +/* Double-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct ucs_interval16 ucs_double_width_bmp_ranges[] = { + { 0x1100, 0x115F }, /* HANGUL CHOSEONG KIYEOK - HANGUL CHOSEONG FILLER */ + { 0x231A, 0x231B }, /* WATCH - HOURGLASS */ + { 0x2329, 0x232A }, /* LEFT-POINTING ANGLE BRACKET - RIGHT-POINTING ANGLE BRACKET */ + { 0x23E9, 0x23EC }, /* BLACK RIGHT-POINTING DOUBLE TRIANGLE - BLACK DOWN-POINTING DOUBLE TRIANGLE */ + { 0x23F0, 0x23F0 }, /* ALARM CLOCK */ + { 0x23F3, 0x23F3 }, /* HOURGLASS WITH FLOWING SAND */ + { 0x25FD, 0x25FE }, /* WHITE MEDIUM SMALL SQUARE - BLACK MEDIUM SMALL SQUARE */ + { 0x2614, 0x2615 }, /* UMBRELLA WITH RAIN DROPS - HOT BEVERAGE */ + { 0x2630, 0x2637 }, /* TRIGRAM FOR HEAVEN - TRIGRAM FOR EARTH */ + { 0x2648, 0x2653 }, /* ARIES - PISCES */ + { 0x267F, 0x267F }, /* WHEELCHAIR SYMBOL */ + { 0x268A, 0x268F }, /* MONOGRAM FOR YANG - DIGRAM FOR GREATER YIN */ + { 0x2693, 0x2693 }, /* ANCHOR */ + { 0x26A1, 0x26A1 }, /* HIGH VOLTAGE SIGN */ + { 0x26AA, 0x26AB }, /* MEDIUM WHITE CIRCLE - MEDIUM BLACK CIRCLE */ + { 0x26BD, 0x26BE }, /* SOCCER BALL - BASEBALL */ + { 0x26C4, 0x26C5 }, /* SNOWMAN WITHOUT SNOW - SUN BEHIND CLOUD */ + { 0x26CE, 0x26CE }, /* OPHIUCHUS */ + { 0x26D4, 0x26D4 }, /* NO ENTRY */ + { 0x26EA, 0x26EA }, /* CHURCH */ + { 0x26F2, 0x26F3 }, /* FOUNTAIN - FLAG IN HOLE */ + { 0x26F5, 0x26F5 }, /* SAILBOAT */ + { 0x26FA, 0x26FA }, /* TENT */ + { 0x26FD, 0x26FD }, /* FUEL PUMP */ + { 0x2705, 0x2705 }, /* WHITE HEAVY CHECK MARK */ + { 0x270A, 0x270B }, /* RAISED FIST - RAISED HAND */ + { 0x2728, 0x2728 }, /* SPARKLES */ + { 0x274C, 0x274C }, /* CROSS MARK */ + { 0x274E, 0x274E }, /* NEGATIVE SQUARED CROSS MARK */ + { 0x2753, 0x2755 }, /* BLACK QUESTION MARK ORNAMENT - WHITE EXCLAMATION MARK ORNAMENT */ + { 0x2757, 0x2757 }, /* HEAVY EXCLAMATION MARK SYMBOL */ + { 0x2795, 0x2797 }, /* HEAVY PLUS SIGN - HEAVY DIVISION SIGN */ + { 0x27B0, 0x27B0 }, /* CURLY LOOP */ + { 0x27BF, 0x27BF }, /* DOUBLE CURLY LOOP */ + { 0x2B1B, 0x2B1C }, /* BLACK LARGE SQUARE - WHITE LARGE SQUARE */ + { 0x2B50, 0x2B50 }, /* WHITE MEDIUM STAR */ + { 0x2B55, 0x2B55 }, /* HEAVY LARGE CIRCLE */ + { 0x2E80, 0x2E99 }, /* CJK RADICAL REPEAT - CJK RADICAL RAP */ + { 0x2E9B, 0x2EF3 }, /* CJK RADICAL CHOKE - CJK RADICAL C-SIMPLIFIED TURTLE */ + { 0x2F00, 0x2FD5 }, /* KANGXI RADICAL ONE - KANGXI RADICAL FLUTE */ + { 0x2FF0, 0x3029 }, /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT - HANGZHOU NUMERAL NINE */ + { 0x3030, 0x303E }, /* WAVY DASH - IDEOGRAPHIC VARIATION INDICATOR */ + { 0x3041, 0x3096 }, /* HIRAGANA LETTER SMALL A - HIRAGANA LETTER SMALL KE */ + { 0x309B, 0x30FF }, /* KATAKANA-HIRAGANA VOICED SOUND MARK - KATAKANA DIGRAPH KOTO */ + { 0x3105, 0x312F }, /* BOPOMOFO LETTER B - BOPOMOFO LETTER NN */ + { 0x3131, 0x318E }, /* HANGUL LETTER KIYEOK - HANGUL LETTER ARAEAE */ + { 0x3190, 0x31E5 }, /* IDEOGRAPHIC ANNOTATION LINKING MARK - CJK STROKE SZP */ + { 0x31EF, 0x321E }, /* IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION - PARENTHESIZED KOREAN CHARACTER O HU */ + { 0x3220, 0x3247 }, /* PARENTHESIZED IDEOGRAPH ONE - CIRCLED IDEOGRAPH KOTO */ + { 0x3250, 0xA48C }, /* PARTNERSHIP SIGN - YI SYLLABLE YYR */ + { 0xA490, 0xA4C6 }, /* YI RADICAL QOT - YI RADICAL KE */ + { 0xA960, 0xA97C }, /* HANGUL CHOSEONG TIKEUT-MIEUM - HANGUL CHOSEONG SSANGYEORINHIEUH */ + { 0xAC00, 0xD7A3 }, /* HANGUL SYLLABLE GA - HANGUL SYLLABLE HIH */ + { 0xF900, 0xFAFF }, /* U+F900 - U+FAFF */ + { 0xFE10, 0xFE19 }, /* PRESENTATION FORM FOR VERTICAL COMMA - PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS */ + { 0xFE30, 0xFE52 }, /* PRESENTATION FORM FOR VERTICAL TWO DOT LEADER - SMALL FULL STOP */ + { 0xFE54, 0xFE66 }, /* SMALL SEMICOLON - SMALL EQUALS SIGN */ + { 0xFE68, 0xFE6B }, /* SMALL REVERSE SOLIDUS - SMALL COMMERCIAL AT */ + { 0xFF01, 0xFF60 }, /* FULLWIDTH EXCLAMATION MARK - FULLWIDTH RIGHT WHITE PARENTHESIS */ + { 0xFFE0, 0xFFE6 }, /* FULLWIDTH CENT SIGN - FULLWIDTH WON SIGN */ +}; + +/* Double-width character ranges (non-BMP, U+10000 and above) */ +static const struct ucs_interval32 ucs_double_width_non_bmp_ranges[] = { { 0x16FE0, 0x16FE3 }, /* TANGUT ITERATION MARK - OLD CHINESE ITERATION MARK */ { 0x17000, 0x187F7 }, /* U+17000 - U+187F7 */ { 0x18800, 0x18CD5 }, /* TANGUT COMPONENT-001 - KHITAN SMALL SCRIPT CHARACTER-18CD5 */ From patchwork Thu Apr 17 18:45:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 882207 Received: from fout-b5-smtp.messagingengine.com (fout-b5-smtp.messagingengine.com [202.12.124.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9CABC2571C4; Thu, 17 Apr 2025 18:49:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.148 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915753; cv=none; b=N4xBEa6+py/zrIfZo5AmTSEgPfDp9fRnQwM3jTSoKibyvPYy+qhMLlcWsGqes7UkdFktcj5zgxK5yb4RekN00yiJ5lY/aMXNClbT14Ukk/O6nTg9FtWVrwIwZiCe5ouxT+zN/TUzxIBDrPH3MisOcciCnupz/zyESTmK+p/Z3A4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744915753; c=relaxed/simple; bh=Z9PSCSDOhSboXawTl/XaHB3xvDikfyVQPCnos/5YTFM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LbhCPUNAmSrvT4PWNsL0Krqm8j1ec8+/0GytaqVjrazEEgR/5g+A6rduy3AGIDMIF/R/sBIWq19MrP4WYVdn2gGeicBcJ5tWzj+tSeGM97VFgouqKXVzHgZXShUquTMNfl+vAvNPkVxqCTwQ7F11iDvJMgHMeVavGONdKyLSr8A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=XkYRjebd; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=mwjKAj5m; arc=none smtp.client-ip=202.12.124.148 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="XkYRjebd"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="mwjKAj5m" Received: from phl-compute-06.internal (phl-compute-06.phl.internal [10.202.2.46]) by mailfout.stl.internal (Postfix) with ESMTP id 74DE5114011C; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-06.internal (MEProxy); Thu, 17 Apr 2025 14:49:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744915749; x= 1745002149; bh=4lKdPBwhtL8GBmmzd2oORR2sO3eJQrQmsD2jhAJOypQ=; b=X kYRjebd558+bupJqqLP997QRJ1rlu3lP5H0AwQ9CIxcadLFqmuooxh8yVbx/couU 8cIiuu8CWuuTeRfqOmV3JWk5gNayClr3zk3gTSBZCBHDEK7XkgfUZnhz7IpBySFG F18tDgtJWHuj2SnhpQXCRVx9Lvznl2VBC1s/G7rj5a+i6fQsywI2LJoLS6otnpee 2064SeV5DeGzNonVXuhNTye3a1hCcmoQ5zPQZvW5OOv6fguXUiIvF5FqImgRFj63 ngzaweSna8lF3fqT3NR1qBv0O+Fv66DxMShsqOhOBP/h+DVdN5xOALSfrlfW2TIc kFzZAo6bay0B5qBSwBnOQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744915749; x=1745002149; bh=4 lKdPBwhtL8GBmmzd2oORR2sO3eJQrQmsD2jhAJOypQ=; b=mwjKAj5m9pw0yTymV a7z7wAgSgullHdV7fZBGuSYPrfu22TJT2ii6++yY0KC1D/nHuisRIJQNRh2VzOm1 +tKv3/Nd/1mla47wf6v+elspQhARlcxDS05jfN9z8W3cjNO9K90WCmuAITGM4xvG M3l+F5RYg09KwOaeqibrGSZsFbcm8dXd4v9/avJRt6+eb1iDpju+YHjgLdZ3Ejbb PptKd7LwzoM+8aKKxjylnLG+b5NoAFytd8/q3ogGtUtYVowacGiJ7xj4KhsWlgJo a+Pc2ecf7lVEyskjUbYIRF1ul2s/nD/RmqkbLrKZWfAQBwQS88I61v8QaVyhHZXs D+Oew== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvfedttdefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdho rhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrd horhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 17 Apr 2025 14:49:09 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 6522E11443DC; Thu, 17 Apr 2025 14:49:08 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 14/14] vt: move UCS tables to the "shipped" form Date: Thu, 17 Apr 2025 14:45:16 -0400 Message-ID: <20250417184849.475581-15-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net> References: <20250417184849.475581-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Use the "shipped" mechanism to copy pre-generated tables to the build tree by default. If GENERATE_UCS_TABLES=1 then they are generated at build time instead. If GENERATE_UCS_TABLES=2 then gen_ucs_recompose_table.py is invoked with --full. Signed-off-by: Nicolas Pitre Suggested-by: Jiri Slaby --- drivers/tty/vt/Makefile | 27 ++++++++++++++++++- drivers/tty/vt/gen_ucs_recompose_table.py | 10 ++++--- drivers/tty/vt/gen_ucs_width_table.py | 18 +++++++++---- ..._table.h => ucs_recompose_table.h_shipped} | 0 ...idth_table.h => ucs_width_table.h_shipped} | 0 5 files changed, 45 insertions(+), 10 deletions(-) rename drivers/tty/vt/{ucs_recompose_table.h => ucs_recompose_table.h_shipped} (100%) rename drivers/tty/vt/{ucs_width_table.h => ucs_width_table.h_shipped} (100%) diff --git a/drivers/tty/vt/Makefile b/drivers/tty/vt/Makefile index e24c8546ac..8ba33cc942 100644 --- a/drivers/tty/vt/Makefile +++ b/drivers/tty/vt/Makefile @@ -11,7 +11,8 @@ obj-$(CONFIG_CONSOLE_TRANSLATIONS) += consolemap.o consolemap_deftbl.o \ ucs.o # Files generated that shall be removed upon make clean -clean-files := consolemap_deftbl.c defkeymap.c +clean-files := consolemap_deftbl.c defkeymap.c \ + ucs_width_table.h ucs_recompose_table.h hostprogs += conmakehash @@ -34,3 +35,27 @@ $(obj)/defkeymap.c: $(obj)/%.c: $(src)/%.map loadkeys --mktable --unicode $< > $@ endif + +$(obj)/ucs.o: $(src)/ucs.c $(obj)/ucs_width_table.h $(obj)/ucs_recompose_table.h + +# You may uncomment one of those to have the UCS tables be regenerated +# during the build process. By default the _shipped versions are used. +# +#GENERATE_UCS_TABLES := 1 +#GENERATE_UCS_TABLES := 2 # invokes gen_ucs_recompose_table.py with --full + +ifdef GENERATE_UCS_TABLES + +$(obj)/ucs_width_table.h: $(src)/gen_ucs_width_table.py + $(PYTHON3) $< -o $@ + +ifeq ($(GENERATE_UCS_TABLES),2) +gen_recomp_arg := --full +else +gen_recomp_arg := +endif + +$(obj)/ucs_recompose_table.h: $(src)/gen_ucs_recompose_table.py + $(PYTHON3) $< -o $@ $(gen_recomp_arg) + +endif diff --git a/drivers/tty/vt/gen_ucs_recompose_table.py b/drivers/tty/vt/gen_ucs_recompose_table.py index d30f8f5242..4434a436ac 100755 --- a/drivers/tty/vt/gen_ucs_recompose_table.py +++ b/drivers/tty/vt/gen_ucs_recompose_table.py @@ -19,8 +19,8 @@ import textwrap from pathlib import Path this_file = Path(__file__).name -# Output file name -out_file = "ucs_recompose_table.h" +# Default output file name +DEFAULT_OUT_FILE = "ucs_recompose_table.h" common_recompose_description = "most commonly used Latin, Greek, and Cyrillic recomposition pairs only" COMMON_RECOMPOSITION_PAIRS = [ @@ -165,7 +165,7 @@ def validate_common_pairs(full_list): print(error_msg) raise ValueError(error_msg) -def generate_recomposition_table(use_full_list=False): +def generate_recomposition_table(use_full_list=False, out_file=DEFAULT_OUT_FILE): """Generate the recomposition C table.""" # Collect all recomposition pairs for validation @@ -250,6 +250,8 @@ if __name__ == "__main__": parser = argparse.ArgumentParser(description="Generate Unicode recomposition table") parser.add_argument("--full", action="store_true", help="Generate a full recomposition table (default: common pairs only)") + parser.add_argument("-o", "--output", dest="output_file", default=DEFAULT_OUT_FILE, + help=f"Output file name (default: {DEFAULT_OUT_FILE})") args = parser.parse_args() - generate_recomposition_table(use_full_list=args.full) + generate_recomposition_table(use_full_list=args.full, out_file=args.output_file) diff --git a/drivers/tty/vt/gen_ucs_width_table.py b/drivers/tty/vt/gen_ucs_width_table.py index 059ed9a8ba..76e80ebeff 100755 --- a/drivers/tty/vt/gen_ucs_width_table.py +++ b/drivers/tty/vt/gen_ucs_width_table.py @@ -5,13 +5,14 @@ import unicodedata import sys +import argparse # This script's file name from pathlib import Path this_file = Path(__file__).name -# Output file name -out_file = "ucs_width_table.h" +# Default output file name +DEFAULT_OUT_FILE = "ucs_width_table.h" # --- Global Constants for Width Assignments --- @@ -185,13 +186,14 @@ def create_width_tables(): return zero_width_ranges, double_width_ranges -def write_tables(zero_width_ranges, double_width_ranges): +def write_tables(zero_width_ranges, double_width_ranges, out_file=DEFAULT_OUT_FILE): """ Write the generated tables to C header file. Args: zero_width_ranges: List of (start, end) ranges for zero-width characters double_width_ranges: List of (start, end) ranges for double-width characters + out_file: Output file name (default: DEFAULT_OUT_FILE) """ # Function to split ranges into BMP (16-bit) and non-BMP (above 16-bit) @@ -286,14 +288,20 @@ static const struct ucs_interval32 ucs_double_width_non_bmp_ranges[] = { f.write("};\n") if __name__ == "__main__": + # Parse command line arguments + parser = argparse.ArgumentParser(description="Generate Unicode width tables") + parser.add_argument("-o", "--output", dest="output_file", default=DEFAULT_OUT_FILE, + help=f"Output file name (default: {DEFAULT_OUT_FILE})") + args = parser.parse_args() + # Write tables to header file zero_width_ranges, double_width_ranges = create_width_tables() - write_tables(zero_width_ranges, double_width_ranges) + write_tables(zero_width_ranges, double_width_ranges, out_file=args.output_file) # Print summary zero_width_count = sum(end - start + 1 for start, end in zero_width_ranges) double_width_count = sum(end - start + 1 for start, end in double_width_ranges) - print(f"Generated {out_file} with:") + print(f"Generated {args.output_file} with:") print(f"- {len(zero_width_ranges)} zero-width ranges covering ~{zero_width_count} code points") print(f"- {len(double_width_ranges)} double-width ranges covering ~{double_width_count} code points") print(f"- Unicode Version: {unicodedata.unidata_version}") diff --git a/drivers/tty/vt/ucs_recompose_table.h b/drivers/tty/vt/ucs_recompose_table.h_shipped similarity index 100% rename from drivers/tty/vt/ucs_recompose_table.h rename to drivers/tty/vt/ucs_recompose_table.h_shipped diff --git a/drivers/tty/vt/ucs_width_table.h b/drivers/tty/vt/ucs_width_table.h_shipped similarity index 100% rename from drivers/tty/vt/ucs_width_table.h rename to drivers/tty/vt/ucs_width_table.h_shipped