Skip to content

Commit

Permalink
改为运行时从 json 文件中读取单字和词语拼音数据,缓解 Python 3.12 环境中存在的性能劣化问题 (#324)
Browse files Browse the repository at this point in the history
详见:#319 #322


* perf[code-loader]prevent py-code too large #322

* feat[generator]pinyin and phrases

* feat[generator]one key initer

* refine save dict into json and load from json on runtime

* fix lint and test

* try to fix "No module named distutils in Python 3.12"

* try to fix "UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 118: character maps to <undefined>"

* fix test on python2

---------

Co-authored-by: mozillazg <[email protected]>
  • Loading branch information
serfend and mozillazg authored Jul 20, 2024
1 parent 29cacd8 commit 96ffa79
Show file tree
Hide file tree
Showing 15 changed files with 664,841 additions and 87,162 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,9 @@ jobs:
python -m pip install dist/*.gz
python -m pypinyin test
python -m pypinyin.tools.toneconvert to-tone 'zhong4 xin1'
- name: test import time
if: startsWith(matrix.os,'ubuntu')
run: |
set -xe
time python -c 'from pypinyin import pinyin'
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,18 @@ repos:
hooks:
- id: check-merge-conflict
- id: debug-statements
exclude: 'tools/|(pypinyin/(phrases_dict.py|pinyin_dict.py|phonetic_symbol.py))'
exclude: 'tools/|(pypinyin/(legacy/|phonetic_symbol.py))'
- id: double-quote-string-fixer
exclude: 'pypinyin/(phrases_dict.py|pinyin_dict.py|phonetic_symbol.py)'
exclude: 'pypinyin/(legacy/|phonetic_symbol.py)'
- id: end-of-file-fixer
exclude: '.bumpversion.cfg'
exclude: '.bumpversion.cfg|.*.json'
- id: requirements-txt-fixer
- id: trailing-whitespace
- repo: https://github.com/pycqa/flake8
rev: 3.8.4
hooks:
- id: flake8
exclude: 'tools|pypinyin/(phrases_dict.py|pinyin_dict.py|phonetic_symbol.py)|(docs/conf.py)'
exclude: 'tools|pypinyin/(legacy/|phonetic_symbol.py)|(docs/conf.py)'
# - repo: https://github.com/pre-commit/mirrors-mypy
# rev: 'v0.812'
# hooks:
Expand Down
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
include README.rst LICENSE.txt CHANGELOG.rst
recursive-include pypinyin *.pyi py.typed
recursive-include pypinyin *.pyi py.typed *.json
12 changes: 9 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,18 @@ gen_data: sync_submodule gen_pinyin_dict gen_phrases_dict

.PHONY: gen_pinyin_dict
gen_pinyin_dict: sync_submodule
python gen_pinyin_dict.py pinyin-data/pinyin.txt pypinyin/pinyin_dict.py
python gen_pinyin_dict.py pinyin-data/pinyin.txt pypinyin/legacy/pinyin_dict.py
$(MAKE) to_json source=pypinyin/legacy/pinyin_dict.py var=pinyin_dict dst=pypinyin/pinyin_dict.json

.PHONY: gen_phrases_dict
gen_phrases_dict: sync_submodule
python gen_phrases_dict.py phrase-pinyin-data/pinyin.txt pypinyin/phrases_dict_large.py
python tidy_phrases_dict.py
python gen_phrases_dict.py phrase-pinyin-data/pinyin.txt pypinyin/legacy/phrases_dict.py
$(MAKE) to_json source=pypinyin/legacy/phrases_dict.py var=phrases_dict dst=pypinyin/phrases_dict.json


.PHONY: to_json
to_json:
python -c 'import json; exec(open("$(source)").read()); json.dump($(var), open("$(dst)", "w"), ensure_ascii=False, sort_keys=True, indent="")'

.PHONY: sync_submodule
sync_submodule:
Expand Down
Empty file added pypinyin/legacy/__init__.py
Empty file.
47,104 changes: 47,104 additions & 0 deletions pypinyin/legacy/phrases_dict.py

Large diffs are not rendered by default.

8,688 changes: 8,688 additions & 0 deletions pypinyin/legacy/phrases_dict_tidy.py

Large diffs are not rendered by default.

Loading

0 comments on commit 96ffa79

Please sign in to comment.