simplified-ECON
is a simplified version of the ECON pipeline introduced in the following paper: Concept Mining via Embedding. The main simplification lies in the candidate generation stage of the pipeline, for which simplified-ECON
only uses AutoPhrase, while ECON uses multiple candidate generation techniques. This repository also includes a comparison between AutoPhrase, ECON, and PRDR Phrase Detection results on the same input corpus (10000 arXiv computer science paper abstracts).
-
Candidate Generation
-
Superspan Sequence Generation
-
Embedding Construction
-
Feature Generation
-
Classifier Training
-
Concept Recognition
The code for each stage in the pipeline builds off of or is copied from the original ECON pipeline implementation.
The autophrase_comparison.ipynb
notebook is a novel contribution, and can be used to compare the extracted concept results of the AutoPhrase and ECON pipelines.
Additionally, the method_evaluation.ipynb
notebook is a novel contribution, and can be used to evaluate the performance of the AutoPhrase, PRDR Phrase Detection, and ECON pipelines.
Navigate to the simplified-ECON
directory and setup a new conda
environment using the following commands.
conda create -n se python=3.8.5 -y
conda activate se
conda install ipykernel -y
ipython kernel install --user --name=se
Clone the AutoPhrase repository. In the candidate_generation.ipynb
and feature_generation.ipynb
notebooks, set AUTOPHRASE_PATH
to the path of the cloned AutoPhrase repository.
Install the dependencies using the following command.
pip install -r requirements.txt
To run the pipeline, run the cells of the Jupyter notebooks in the order of the pipeline steps listed above, using jupyter lab
, ensuring the se
kernel is selected.
To compare the results of the AutoPhrase and ECON pipelines, run the cells of the autophrase_comparison.ipynb
notebook.
To evaluate the performance of the AutoPhrase, PRDR Phrase Detection, and ECON pipelines, run the cells of the method_evaluation.ipynb
notebook.
- Rishi Masand
Keqian Li, Hanwen Zha, Yu Su, and Xifeng Yan, "Concept Mining via Embedding", 2018.
Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, and Jiawei Han, "Automated Phrase Mining from Massive Text Corpora", 2017.