GitHub - dksanyal/Author-Name-Disambiguation-in-PubMed: Author name disambiguation in PubMed using Random Forest and Gradient Boosted Trees

About: Author Name Disambiguation using Random Forest and Gradient Boosted Tree Classifier

Author: The scripts have been written by Mr. Kaushal Jhawar. Reference paper: Kaushal Jhawar, Debarshi Kumar Sanyal, Samiran Chattopadhyay, Plaban Kumar Bhowmick, and Partha Pratim Das. (2020, August). Author name disambiguation in PubMed using ensemble-based classification algorithms. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), Xi’an, Shaanxi, P. R. China, 1-4 August, 2020. (Please cite the above paper if you use our program in your research/applications. We would also love your hear your feedback.)

Programming Language: Python 3.7.0

Operating System: 64 bit - Windows 10, x64 based processor. Ubuntu 18.04.1 LTS

External Dependencies: Biobert_v1.1_pubmed: https://github.com/naver/biobert-pretrained. Version: BioBERT-Base v1.1 (+ PubMed 1M) - based on BERT-base-Cased (same vocabulary)

Python Libraries Required: pyjarowinkler, xlsxwriter, xlrd, collections, bs4, nltk, openpyxl, numpy, flair (for embeddings and data), gensim (for Word2Vec), pandas, sklearn, matplotlib, json

Instruction to Run the Code: Follow the “Step_to_run_code.txt”

Input to the model: Dataset in the current working directory. There is no command line input, already coded in the script.

Output by the model: Printed during the execution of the python program.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Instructions		Instructions
Metadata_frequency		Metadata_frequency
2018MeShTreeHierarchy.xml		2018MeShTreeHierarchy.xml
Author_Metadata.zip		Author_Metadata.zip
GBT_Validation_And_Testing.py		GBT_Validation_And_Testing.py
Pre_Recall_RF_And_GBT.py		Pre_Recall_RF_And_GBT.py
README.md		README.md
RF_Validation_And_Testing.py		RF_Validation_And_Testing.py
combine_author.py		combine_author.py
config.json		config.json
similarity_profile_generation.py		similarity_profile_generation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

dksanyal/Author-Name-Disambiguation-in-PubMed

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages