Skip to content

Author name disambiguation in PubMed using Random Forest and Gradient Boosted Trees

Notifications You must be signed in to change notification settings

dksanyal/Author-Name-Disambiguation-in-PubMed

Repository files navigation

About: Author Name Disambiguation using Random Forest and Gradient Boosted Tree Classifier

Author: The scripts have been written by Mr. Kaushal Jhawar. Reference paper: Kaushal Jhawar, Debarshi Kumar Sanyal, Samiran Chattopadhyay, Plaban Kumar Bhowmick, and Partha Pratim Das. (2020, August). Author name disambiguation in PubMed using ensemble-based classification algorithms. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), Xi’an, Shaanxi, P. R. China, 1-4 August, 2020. (Please cite the above paper if you use our program in your research/applications. We would also love your hear your feedback.)

Programming Language: Python 3.7.0

Operating System: 64 bit - Windows 10, x64 based processor. Ubuntu 18.04.1 LTS

External Dependencies: Biobert_v1.1_pubmed: https://github.com/naver/biobert-pretrained. Version: BioBERT-Base v1.1 (+ PubMed 1M) - based on BERT-base-Cased (same vocabulary)

Python Libraries Required: pyjarowinkler, xlsxwriter, xlrd, collections, bs4, nltk, openpyxl, numpy, flair (for embeddings and data), gensim (for Word2Vec), pandas, sklearn, matplotlib, json

Instruction to Run the Code: Follow the “Step_to_run_code.txt”

Input to the model: Dataset in the current working directory. There is no command line input, already coded in the script.

Output by the model: Printed during the execution of the python program.

About

Author name disambiguation in PubMed using Random Forest and Gradient Boosted Trees

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages