Skip to content
This repository has been archived by the owner on Mar 14, 2024. It is now read-only.

Latest commit

 

History

History
46 lines (29 loc) · 1.54 KB

README.md

File metadata and controls

46 lines (29 loc) · 1.54 KB

Excite

Provides a simple Ruby API for parsing citations from plain text strings or HTML.

Usage

  require 'excite'

  Excite.parse_string("Wilcox, Rhonda V. 1991. Shifting roles and synthetic women in Star trek: The next generation. Studies in Popular Culture 13 (June): 53-65.")

  Excite.parse_html("<span>Devine, PG, & Sherman, SJ</span><span>(1992)</span><strong>Intuitive versus rational judgment and the role of stereotyping in the human condition: Kirk or Spock?</strong><em>Psychological Inquiry</em><span>3(2), 153-159</span>")

History and Credits

Derived from FreeCite, minus Rails and all UI elements. The most up-to-date fork of FreeCite of which I am aware is rsinger's. FreeCite in turn is inspired by ParsCit.

The main changes are:

  • No UI, just a gem;
  • New model for parsing HTML;
  • Tokenization and part-of-speech features from EngTagger.

Credit is due to the authors of all the linked projects, as well as Laura Durkay who marked up the HTML training data.

Install required packages

From source

wget http://crfpp.googlecode.com/files/CRF%2B%2B-0.57.tar.gz
tar xvzf CRF++-0.57.tar.gz
cd CRF++-0.57
./configure 
make
sudo make install

On Ubuntu

sudo apt-add-repository 'deb http://cl.naist.jp/~eric-n/ubuntu-nlp oneiric all'
sudo apt-get update
sudo apt-get install libcrf++

On OS X with Homebrew

brew install crf++