A large proportion of lead compounds are derived from natural products. However, most natural products have not been fully tested for their targets. To help resolve this problem, a model using transfer learning was built to predict targets for natural products. The target prediction model can be applied in the field of natural product-based drug discovery and has the potential to find more lead compounds or to assist researchers in drug repurposing. This repository contains the code to reproduce the results from our published paper 'Target Prediction Model for Natural Products Using Transfer Learning'. Only acadamic or non-commercial usage is allowed.
The bioactivity data used for training can be derived from the offical website of ChEMBL and the structures of natural products can be downloaded from COCONUT. The code needed for cleaning and processing data are provided.
The model was pre-trained on a processed ChEMBL dataset and then fine-tuned on a natural product dataset. Benefitting from these techniques, the model achieved a highly promising area under the receiver operating characteristic curve (AUROC) score of 0.910, with limited task-related training samples. The boost effect of model's AUROC can be viewed in the belowed Figure.
All the model's defination can be found in pretrain.py and finetune.py
Bo Qiang, School of Pharmaceutical Sciences, Peking University