- Removed deprecated string taint checking (@bbonamin).
AnyStyle::Parser#parse
will no longer automatically open local files. Please callWapiti::Dataset.open
explicitly if you relied on this.
- Updated parser model.
- Improved support of German-language journal conventions (@a-fent)
- Updated parser model.
- Updated and improved normalizers and CSL format.
- Improved Chinese reference tokenization.
- Added option to customize pdftotext path.
- Improved Finder reference line joining.
- Improved Finder model; training sets.
- Improved Parser model; training sets.
- Added check and train commands to CLI.
- Added --no-solo and --crop flags to find command.
- Added reference block normalizer.
- Added script detection normalizer.
- Improved Finder reference line joining.
- Improved Finder model; training sets.
- Improved Parser model; training sets.
- Improved Finder model; training sets.
- Volume normalizer: extract page numbers and dates.
- Fixed errors in Names and Publisher normalizer.
- Added Unicode normalizer to default normalizers.
- Initial 1.0 release!
This release isn't backwards compatible to the 0.x branch.
The new release uses the
AnyStyle
module via theanystyle
Gem. The old 0.x branch used theAnystyle
module via theanystyle-parser
Gem but isn't maintained any longer. - Includes improved parser model and training sets.
- Based on updated
wapiti-ruby
which builds on Linux, macOS, and Windows platforms (thanks @a-fent and @WouterJeuris). - Flexible normalizer architecture (you can skip individual normalizers).
- Improved feature architecture.
- Improved input/output via Wapiti::Dataset.
- New default dictionary adapter (thanks @a-fent).
- New GDBM dictionary adapter.
- Use real XML for training sets.
- Experimental Finder component for PDF and text document analysis.
- Dictionary data moved to
anystyle-data
Gem. - New CLI tool
anystyle-cli
. - Dropped support for Ruby 2.2 and older.