Message ID | b7d39428-a95b-eeff-4958-e77af2ff6d50@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, Nov 29, 2016 at 5:08 AM, Florian Weimer <fweimer@redhat.com> wrote: > On 11/28/2016 10:23 AM, Andreas Schwab wrote: >> >> FAIL: iconvdata/tst-tables >> failed: ./tst-table.sh >> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/ >> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/ >> GBK > > Here's one way to fix it. Doesn't look really pretty, but it seems that > this new 0x80 mapping is quite an outlier. +1 from me - this is consistent with the behavior specified for GBK in the web "Encoding Standard" (https://encoding.spec.whatwg.org/#gbk-decoder) which has been obsessively tuned for maximum compatibility with existing content. zw
On Tue, Nov 29, 2016 at 5:08 AM, Florian Weimer <fweimer@redhat.com> wrote: > On 11/28/2016 10:23 AM, Andreas Schwab wrote: >> FAIL: iconvdata/tst-tables >> failed: ./tst-table.sh >> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/ >> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/ >> GBK > > Here's one way to fix it. Doesn't look really pretty, but it seems that > this new 0x80 mapping is quite an outlier. i'm not super familiar with the gconv modules, but this seems to make sense should use lowercase hex constants though to match surrounding code style my dev box is offline pending a move, so i won't be able to push anything for a while -mike
On 11/29/2016 06:04 PM, Mike Frysinger wrote: > On Tue, Nov 29, 2016 at 5:08 AM, Florian Weimer <fweimer@redhat.com> wrote: >> On 11/28/2016 10:23 AM, Andreas Schwab wrote: >>> FAIL: iconvdata/tst-tables >>> failed: ./tst-table.sh >>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/ >>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/ >>> GBK >> >> Here's one way to fix it. Doesn't look really pretty, but it seems that >> this new 0x80 mapping is quite an outlier. > > i'm not super familiar with the gconv modules, but this seems to make sense > > should use lowercase hex constants though to match surrounding code style Right, I'll fix this before committing it. Thanks, Florian
gconv: Adjust GBK to support the Euro sign Commit aa4d00ca39e604ac4e9fead401ccd4483e11a281 only updated the data used by locales. 2016-11-29 Florian Weimer <fweimer@redhat.com> * iconvdata/gbk.c (BODY): Add Euro sign support (both directions). diff --git a/iconvdata/gbk.c b/iconvdata/gbk.c index fc32a50..d39e398 100644 --- a/iconvdata/gbk.c +++ b/iconvdata/gbk.c @@ -13148,8 +13148,17 @@ static const char __gbk_from_ucs4_tab12[][2] = if (__builtin_expect (ch <= 0x80, 0) \ || __builtin_expect (ch > 0xfe, 0)) \ { \ - /* This is illegal. */ \ - STANDARD_FROM_LOOP_ERR_HANDLER (1); \ + if (__glibc_likely (ch == 0x80)) \ + { \ + /* Exception for the Euro sign (see CP936). */ \ + ch = 0x20AC; \ + ++inptr; \ + } \ + else \ + { \ + /* This is illegal. */ \ + STANDARD_FROM_LOOP_ERR_HANDLER (1); \ + } \ } \ else \ { \ @@ -13292,6 +13301,10 @@ static const char __gbk_from_ucs4_tab12[][2] = case 0x2010 ... 0x203b: \ cp = __gbk_from_ucs4_tab4[ch - 0x2010]; \ break; \ + case 0x20AC: \ + /* Exception for the Euro sign (see CP396). */ \ + cp = "\x80"; \ + break; \ case 0x2103 ... 0x22bf: \ cp = __gbk_from_ucs4_tab5[ch - 0x2103]; \ break; \