localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864]

Message ID b7d39428-a95b-eeff-4958-e77af2ff6d50@redhat.com
State New
Headers show

Commit Message

Florian Weimer Nov. 29, 2016, 10:08 a.m.
On 11/28/2016 10:23 AM, Andreas Schwab wrote:
> FAIL: iconvdata/tst-tables

> failed: ./tst-table.sh /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/ /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/ GBK


Here's one way to fix it.  Doesn't look really pretty, but it seems that 
this new 0x80 mapping is quite an outlier.

Thanks,
Florian

Comments

Zack Weinberg Nov. 29, 2016, 2:28 p.m. | #1
On Tue, Nov 29, 2016 at 5:08 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 11/28/2016 10:23 AM, Andreas Schwab wrote:

>>

>> FAIL: iconvdata/tst-tables

>> failed: ./tst-table.sh

>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/

>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/

>> GBK

>

> Here's one way to fix it.  Doesn't look really pretty, but it seems that

> this new 0x80 mapping is quite an outlier.


+1 from me - this is consistent with the behavior specified for GBK in
the web "Encoding Standard"
(https://encoding.spec.whatwg.org/#gbk-decoder) which has been
obsessively tuned for maximum compatibility with existing content.

zw
Mike Frysinger Nov. 29, 2016, 5:04 p.m. | #2
On Tue, Nov 29, 2016 at 5:08 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 11/28/2016 10:23 AM, Andreas Schwab wrote:

>> FAIL: iconvdata/tst-tables

>> failed: ./tst-table.sh

>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/

>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/

>> GBK

>

> Here's one way to fix it.  Doesn't look really pretty, but it seems that

> this new 0x80 mapping is quite an outlier.


i'm not super familiar with the gconv modules, but this seems to make sense

should use lowercase hex constants though to match surrounding code style

my dev box is offline pending a move, so i won't be able to push
anything for a while
-mike
Florian Weimer Nov. 29, 2016, 5:38 p.m. | #3
On 11/29/2016 06:04 PM, Mike Frysinger wrote:
> On Tue, Nov 29, 2016 at 5:08 AM, Florian Weimer <fweimer@redhat.com> wrote:

>> On 11/28/2016 10:23 AM, Andreas Schwab wrote:

>>> FAIL: iconvdata/tst-tables

>>> failed: ./tst-table.sh

>>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/

>>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/

>>> GBK

>>

>> Here's one way to fix it.  Doesn't look really pretty, but it seems that

>> this new 0x80 mapping is quite an outlier.

>

> i'm not super familiar with the gconv modules, but this seems to make sense

>

> should use lowercase hex constants though to match surrounding code style


Right, I'll fix this before committing it.

Thanks,
Florian

Patch hide | download patch | download mbox

gconv: Adjust GBK to support the Euro sign

Commit aa4d00ca39e604ac4e9fead401ccd4483e11a281 only updated the
data used by locales.

2016-11-29  Florian Weimer  <fweimer@redhat.com>

	* iconvdata/gbk.c (BODY): Add Euro sign support (both directions).

diff --git a/iconvdata/gbk.c b/iconvdata/gbk.c
index fc32a50..d39e398 100644
--- a/iconvdata/gbk.c
+++ b/iconvdata/gbk.c
@@ -13148,8 +13148,17 @@  static const char __gbk_from_ucs4_tab12[][2] =
       if (__builtin_expect (ch <= 0x80, 0)				      \
 	  || __builtin_expect (ch > 0xfe, 0))				      \
 	{								      \
-	  /* This is illegal.  */					      \
-	  STANDARD_FROM_LOOP_ERR_HANDLER (1);				      \
+	  if (__glibc_likely (ch == 0x80))				      \
+	    {								      \
+	      /* Exception for the Euro sign (see CP936).  */		      \
+	      ch = 0x20AC;						      \
+	      ++inptr;							      \
+	    }								      \
+	  else								      \
+	    {								      \
+	      /* This is illegal.  */					      \
+	      STANDARD_FROM_LOOP_ERR_HANDLER (1);			      \
+	    }								      \
 	}								      \
       else								      \
 	{								      \
@@ -13292,6 +13301,10 @@  static const char __gbk_from_ucs4_tab12[][2] =
 	case 0x2010 ... 0x203b:						      \
 	  cp = __gbk_from_ucs4_tab4[ch - 0x2010];			      \
 	  break;							      \
+	case 0x20AC:							      \
+	  /* Exception for the Euro sign (see CP396).  */		      \
+	  cp = "\x80";							      \
+	  break;							      \
 	case 0x2103 ... 0x22bf:						      \
 	  cp = __gbk_from_ucs4_tab5[ch - 0x2103];			      \
 	  break;							      \