Aldehyde Dehydrogenase (ALDH1) Inhibitor Prediction Project

Overview

The goal of this project is to predict the top 100 molecules that can inhibit Aldehyde dehydrogenase (ALDH1). This is an enzyme involved in the detoxification of aldehydes produced by alcohol metabolism and is implicated in the development of certain types of cancers.

In the first phase of the project, we use a dataset containing molecules labeled either as inhibitors (1) or non-inhibitors (0) of ALDH1. This labeled data is used to train a Machine Learning model that can predict whether a given molecule can inhibit ALDH1 or not.

The second phase involves applying this trained model to a new dataset of molecules (without inhibition information) to predict the top 100 potential ALDH1 inhibitors.

Data

The data used in this project are sets of molecules, with each molecule having associated properties and characteristics. In the initial training phase, each molecule is labeled with either a 0 (no inhibition) or a 1 (inhibition).

For the prediction phase, we utilize a dataset of molecules without known inhibition status. The task is to predict the inhibition status and select the top 100 inhibitors.

Methodology

Data Preprocessing: Initially, we perform some necessary preprocessing steps to clean and format the data.
PCA: We then perform Principal Component Analysis (PCA) to reduce the dimensionality of the dataset and to identify the principal components that capture the most variance in the data.
Model Training: Post PCA, we proceed to train a Machine Learning model using the processed data. This model will learn from the labeled data to differentiate between inhibitors and non-inhibitors.
Prediction and Selection: The trained model is then applied to the second dataset to predict potential inhibitors. The top 100 molecules predicted as inhibitors are selected as the final output.

Repository Contents

The main components of the repository are as follows:

Data/: This folder contains the datasets used for training and prediction.
Notebooks/: This folder contains Jupyter notebooks for data preprocessing, PCA, model training, prediction, and other analyses.
src/: This folder contains Python scripts for various stages of the project.
predictions/: This folder holds the final results, including the list of top 100 predicted inhibitors.
models/: This folder contains the models that were trained using different types of data.
requirements.txt: This file lists the Python dependencies required for this project.
README.md: This file provides an overview of the project and repository.

Contributors

Tijmen Vierwind
Kay Janssen
Stan Dobbelsteen
Giel Dobbelsteen
Tristan Muir
Tim Stassen
Sjoerd de Ruijter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aldehyde Dehydrogenase (ALDH1) Inhibitor Prediction Project

Overview

Data

Methodology

Repository Contents

Contributors

About

Releases

Packages

Contributors 6

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Data		Data
Notebooks		Notebooks
models		models
predictions		predictions
scr		scr
README.md		README.md
requirements.txt		requirements.txt

TimStassen/GroupAssignment8CC00

Folders and files

Latest commit

History

Repository files navigation

Aldehyde Dehydrogenase (ALDH1) Inhibitor Prediction Project

Overview

Data

Methodology

Repository Contents

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages