Skip to content

Releases: polm/fugashi

Version 0.1.10: Python 3.5+ and other features

23 Mar 15:36
Compare
Choose a tag to compare

This release includes a number of small fixes from 0.1.9 and two more significant changes.

Unidic 26 Field Format Support

Unidic has a surprising variety of formats, and the 26-field variety wasn't previously supported. This format includes kana accent information and is notably used in binary distribution of Unidic 2.1.2.

Support for Python 3.5, 3.6

Support for these versions was initially removed due to their short remaining lifespan and lack of a default option in the namedtuple constructor. @tamuhey made the necessary changes to get them working so they're supported for now; thanks!

Other Changes

  • dummy mecabrc specification for bundled Unidic support (still a work in progress)
  • test fixes and documentation
  • deal with comma separate values inside fields

Upcoming Changes

I'm working on creating a bundled version of Unidic. Modern versions of Unidic are too large to distribute via PyPI, so I'm figuring out the best way to distribute the data.

Generic Dictionary Support in v0.1.8

27 Dec 06:39
Compare
Choose a tag to compare

v0.1.8 of fugashi adds support for generic dictionaries. You can now use IPADic or other dictionaries by using a GenericTagger the same way you would use the normal Tagger:

import fugashi
tagger = fugashi.GenericTagger('-d/usr/local/lib/mecab/dic/ipadic')

It's also possible to specify dictionary fields so you can get convenient access to features no matter what dictionary you use.

import fugashi
# the wrapper is just a namedtuple with a default value of None for all fields
MyDictFeatures = fugashi.create_dict_wrapper('MyDictFeatures', 'lemma alpha beta'.split())
tagger = fugashi.GenericTagger('-d/usr/local/lib/mecab/dic/customdic', MyDictFeatures)
nodes = tagger.parseToNodes('blah blah')
node = nodes[0]
print(node.lemma, node.alpha, node.beta)

Some other changes:

  • the raw feature string is now available as .feature_raw on nodes
  • packaging-related fixes
  • initial mecab-ko-dic (Korean) support; needs more testing

Fugashi v0.1.5

28 Nov 07:46
Compare
Choose a tag to compare

This update fixes two issues.

  • When Tagger() gets invalid arguments, throw an error
  • Specify Cython depency correctly (#1)

Thanks to @zdyh for the dependency fix!