Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GB18030 encoded file incorrectly detected as gb2312 #49

Open
wesinator opened this issue Nov 13, 2018 · 1 comment
Open

GB18030 encoded file incorrectly detected as gb2312 #49

wesinator opened this issue Nov 13, 2018 · 1 comment

Comments

@wesinator
Copy link

atom/encoding-selector#65

Steps to Reproduce

https://github.com/malice-plugins/yara/blob/17a4fc946febe8b002e285f591bcb21b92a99e9e/rules/userdb_panda.yar

  • Open in Atom
  • Select "Auto Detect" encoding,

Expected behavior: Detects the encoding of the file as GB18030.
iconv -f GB18030 -t UTF-8 userdb_panda.yar works

Actual behavior: Atom auto detects the encoding as gb2312, 'undefined encoding'
atom_gb2312_undefined

iconv fails to convert from GB2312, but works with GB18030:

iconv -f GB2312 -t UTF-8 userdb_panda.yar
iconv: illegal input sequence at position 29230

Reproduces how often: Always

@byyxx128
Copy link

byyxx128 commented Oct 18, 2019

Glad to see you.

I'm just a general user rather than official maintainer. So I just share some of my ideas here.

GB 2312GBKGB 18030

(By the way, the standard GB 2312-1980 had been renamed to GB/T 2312-1980 in 2017.)

For standard documents they are:
GB/T 2312-1980 ⊊ GBK 1.0 ⊊ GB 18030-2000 ⊊ GB 18030-2005

The latest effective standard is GB 18030-2005. All of the rest were replaced.

Maybe it is hard to identify if a file is encoded in GB 18030 (unless it has unique characters of GB 18030).

For example, if I create a file in GB 18030 and input some characters from CJK Unified Ideographs Extension B, which has been included in GB 18030-2005, it cannot be decoded correctly by encode guess.

microsoft/vscode#33720

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants