mbox series

[0/8] vt: more Unicode handling changes

Message ID 20250505170021.29944-1-nico@fluxnic.net
Headers show
Series vt: more Unicode handling changes | expand

Message

Nicolas Pitre May 5, 2025, 4:55 p.m. UTC
The Linux VT console has many problems with regards to proper Unicode
handling. A first set of patches was submitted here:

https://lore.kernel.org/all/20250417184849.475581-1-nico@fluxnic.net/

Those patches are currently in Greg's tty-next branch.

The first 2 patches in the following series contain fixes for those
already-applied patches.

Remaining patches introduce tables that map complex Unicode characters
to simpler fallback characters for terminal display when corresponding
glyphs are unavailable. Only the subset of Unicode that can reasonably
be substituted by ASCII/Latin-1 characters is covered. Substitution may
not be as good as the actual glyphs but still way more helpful than squared
question marks.

This applies on top of tty-next currently at commit 5ee558c5d9e9.

diffstat:
 drivers/tty/vt/.gitignore                   |    1 +
 drivers/tty/vt/Makefile                     |    8 +-
 drivers/tty/vt/gen_ucs_fallback_table.py    |  881 ++++++++++++
 drivers/tty/vt/ucs.c                        |   89 +-
 drivers/tty/vt/ucs_fallback_table.h_shipped | 1498 +++++++++++++++++++++
 drivers/tty/vt/vt.c                         |   95 +-
 include/linux/consolemap.h                  |    6 +
 7 files changed, 2535 insertions(+), 43 deletions(-)

Comments

Jiri Slaby May 6, 2025, 6:06 a.m. UTC | #1
On 05. 05. 25, 18:55, Nicolas Pitre wrote:
> From: Nicolas Pitre <npitre@baylibre.com>
> 
> No logical changes. Make it easier for enhancements to come.
...
> @@ -2984,12 +2985,40 @@ static int vc_process_ucs(struct vc_data *vc, int *c, int *tc)
>   	return 0;
>   }
>   
> +static int vc_get_glyph(struct vc_data *vc, int tc)
> +{
> +	int glyph = conv_uni_to_pc(vc, tc);
> +	int charmask = vc->vc_hi_font_mask ? 0x1ff : 0xff;

Could you keep charmask unsigned? It used to be u16.

> +
> +	if (!(glyph & ~charmask))
> +		return glyph;
> +
> +	if (glyph == -1)
> +		return -1; /* nothing to display */
> +
> +	/* Glyph not found */
> +

Do no additional \n here ^^.

> +	if ((!vc->vc_utf || vc->vc_disp_ctrl || tc < 128) && !(tc & ~charmask)) {
> +		/*
> +		 * In legacy mode use the glyph we get by a 1:1 mapping.
> +		 * This would make absolutely no sense with Unicode in mind,
> +		 * but do this for ASCII characters since a font may lack
> +		 * Unicode mapping info and we don't want to end up with
> +		 * having question marks only.

Generally: feel free to use 100 characters per line.

> +		 */
> +		return tc;
> +	}
> +
> +	/* Display U+FFFD (Unicode Replacement Character). */
> +	return conv_uni_to_pc(vc, UCS_REPLACEMENT);
> +}

thanks,