- FEATURE: Added pluggable DOM preprocessor.
- FEATURE: Added support for Python 3.2+.
- INCOMPATIBLE CHANGE: Paragraphs are instances of
justext.paragraph.Paragraph
. - INCOMPATIBLE CHANGE: Script 'justext' removed in favour of
command
python -m justext
. - FEATURE: It's possible to enter an URI as input document in CLI.
- FEATURE: It is possible to pass unicode string directly.
- FEATURE: Character counts used instead of word counts where possible in order to make the algorithm work well in the language independent mode (without a stoplist) for languages where counting words is not easy (Japanese, Chinese, Thai, etc).
- BUG FIX: More robust parsing of meta tags containing the information about used charset.
- BUG FIX: Corrected decoding of HTML entities € to Ÿ
- First public release.