Skip to content

Identification of Biomarkers for Cancer Diagnosis with Machine Learning

License

Notifications You must be signed in to change notification settings

Adeyeha/Cancer-Biomarkers-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Identifying Biomarkers for Cancer Diagnosis with Machine Learning

This research explores the use of machine learning (ML) models to identify the most important biomarkers for diagnosing cancer. The data used in this research is an extremely high-dimensional dataset that represents various cancer biomarkers.

Data

The dataset used in this research contains a large number of samples and features. It includes measurements of gene expressions and protein concentrations of various cancer biomarkers. The data was obtained from a publicly available database.

Methodology

This study employed various classes of ML models, including linear, non-linear models and ensembles, to identify the most important biomarkers for cancer diagnosis. The performance of these models was compared using standard evaluation metrics such as accuracy, precision, recall, and F1 score (macro & micro)

Feature selection techniques were applied across filters and wrappers types, including a novel feature selection approach. The purpose of feature selection was to identify the most relevant features that contribute to the accuracy of the models. The results of the different methods are discussed in the paper.

Installation

To run the project, follow these steps:

  1. Clone the repository: git clone https://github.com/Adeyeha/Cancer-Biomarkers-ML.git
  2. Install Python 3.x
  3. Install the required dependencies:
    • pandas: pip install pandas
    • scikit-learn: pip install scikit-learn
    • lazypredict: pip install lazypredict
    • seaborn: pip install seaborn

Usage

This repository contains a series of Jupyter notebooks demonstrating the process of identifying critical biomarkers for cancer diagnosis using machine learning techniques. The notebooks cover various stages, including data preprocessing, feature selection, model training, and evaluation.

Experiment 1: Baseline Models

  • Description: This notebook delves into the dataset, conducts data cleaning, and visualizes key insights using Matplotlib and Seaborn. It establishes baseline models on the processed dataset, which serve as benchmarks for subsequent experiments.

  • Link: Notebook 1 - Baseline Models .

Experiment 2: Filter Methods

  • Description: This notebook emphasizes the implementation of feature selection through filter methods and evaluates these methods in comparison to the established baseline.

  • Link: Notebook 2 - Filter Methods .

Experiment 3: Wrapper Methods

  • Description: This notebook focuses on the practical application of feature selection using wrapper methods. It assesses the performance of these methods relative to the baseline.

  • Link: Notebook 3 - Wrapper Methods .

Experiment 4: Embedded Methods

  • Description: This notebook concentrates on feature selection through embedded methods and evaluates their effectiveness compared to the baseline.

  • Link: Notebook 4 - Embedded Methods .

Experiment 5: Sequential Feature Selection

Experiment 6: RFE-Stability Selection

Notebooks 7 & 8: Final Analysis & Output

License

MIT

Contributing

If you want to contribute to this project, please create a pull request with a detailed description of your changes.

Authors

About

Identification of Biomarkers for Cancer Diagnosis with Machine Learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published