Skip to content

BSBE-IITK/CAFA5-PFP

Repository files navigation

CAFA 5 Protein Function Prediction

The goal of the CAFA competition is to predict the function of a set of proteins. We developed a model trained on the amino-acid sequences of the proteins to predict proteins functions by performing multi-label classification with the Gene Ontology (GO) terms as labels. This work will help ​​researchers better understand the function of proteins, which is important for discovering how cells, tissues, and organs work. This may also aid in the development of new drugs and therapies for various diseases.

Read more about this competition here: Kaggle link

Dataset

We used the following Dataset: link

Models used

  • BLAST
  • K-Nearest Neighbours
  • Random Forest
  • XGBoost
  • Dimension Reduction techniques (PCA, autoencoder)
  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN, LSTM)
  • Multilayer Perceptrons (with embeddings from pretrained transformers, like T5, ProtBERT, ESM2) [BEST Performing]

Team

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published