Update plotting functions #244

niekdejonge · 2024-10-18T09:04:06Z

I started with adding functions for creating the heatmap plots and average prediction per bin plot, but resulted in a large refactoring of the code.

Before we had multiple places in the code where predictions were made, tanimoto scores calculated and losses calculated. These steps are often complex/confusing, since they both have to be the average per inchikey pair (instead of spectrum pair) and the average per bin. Having many different places were these calculations were made and different formats required for plotting, made the code hard to interpret.

This PR is an attempt to reformat the code to make it more harmonized.

@florian-huber
I still had the following questions/choices made that I would like to have your input on:

I did not implement the weighting function for the validation loss calculation. This can easily be done, but it wasn't really clear to me what the purpose of this was. If it is needed, we can add it. If we don't intend to use it anymore, we could also remove it for the training loss calculations.
I did remove all code not used in the main training pipeline. Mostly plotting functions that we likely won't use for this paper and will remove from the default plots created. I removed them, since the input required is a bit different than we use for other plots, which could result in using these plotting functions in the wrong way. If we later want to reuse these plotting functions they are still available on older versions on git.
I did not include the both against both comparison plot. It is also easy to still add, but if we want to add this I think we should think about how we want to implement this. Since now there was no correction done for the number of pos or neg mode spectra available. And spectra from pos and neg with the same inchikeys would still be merged.
The loss functions during running and loss functions during training are still implemented twice. During training torch is used and during validation pandas is used. Beside this difference, during validation we need to do the step of averaging over the multiple spectrum pairs per inchikey pair. Which we don't want to do during training, which means the mean has to be calculated outside of these methods.

…ues for plotting.

…dTanimotoScores and CalculateScoresBetweenAllIonmodes.py

…score_bins

… to use CalculateScoresBetweenAllIonmodes

…tions

…weenAllIonmodes.py

…cores

…update_plotting_functions

…g save load combi)

…_bin.py

ms2deepscore/validation_loss_calculation/PredictionsAndTanimotoScores.py

niekdejonge added 30 commits October 17, 2024 16:39

Create PredictionTanimotoPairs for calculating true and predicted val…

d5910d7

…ues for plotting.

Added function for creating a heatmap

805fda1

Added function for plotting the average per bin

ec4f6e7

First tests for testing plotting

76f3311

Rename class to AveragePredictionAndTanimotoForInchikeyPairs.py

eee5a23

Improve function names

104b88a

Remove old code section

4136962

Split AveragePredictionAndTanimotoForInchikeyPairs into PredictionsAn…

5ae3d19

…dTanimotoScores and CalculateScoresBetweenAllIonmodes.py

Updated tests to use new classes

ae7dea2

Move removing diagonal to init

2897c0b

Add create_dummy_predictions_and_tanimoto_scores

a33dc9f

Move plotting tests to test plotting

f30db73

Add docstrings

ec6d34f

Added methods for calculating average losses per inchikey

d0f29a5

Refactor plot_rmse_per_bin.py to work with PredictionsAndTanimotoScores

eb85d83

Set diagonal to None in dummy data

281ea0a

Add initial test for plotting and calculating losses

0539d3e

Add get_loss_per_inchikey_pair to CalculateScoresBetweenAllIonmodes.py

af52cbe

Make plot_loss_per_bin able to plot different loss types

e7e7d0e

Add labels and list_of_predictions_and_tanimoto_scores

bf7ccb2

Prevent memory leakage

98c2e81

use labels in plot_rmse_per_bin.py and use nr_of_bins instead of ref_…

7cc3f14

…score_bins

Refactor plotting_wrapper_functions and training_wrapper_functions.py…

b51a253

… to use CalculateScoresBetweenAllIonmodes

Add function for creating assymetric dummy tanimoto scores and predic…

a285742

…tions

Update test_plotting to use create_dummy_predictions

1139716

Update test_train_wrapper_ms2ds_model to check for correct files

f203bbc

Remove test_plotting_wrapper_functions.py

6bd4881

Move calculate_tanimoto_scores_unique_inchikeys to CalculateScoresBet…

accb996

…weenAllIonmodes.py

Remove outdated plotting figures

017ad45

Add get average loss per bin functionality to PredictionsAndTanimotoS…

89cb148

…cores

niekdejonge and others added 16 commits October 25, 2024 09:06

Merge remote-tracking branch 'origin/update_plotting_functions' into …

b2ce4c0

…update_plotting_functions

Remove outdated useless test file

0fada74

Remove test models, since outdated

b56b12f

Remove test models duplicate inchikeys

c3e8e0c

Remove outdated unused testmodel_additional_input.hdf5

d5624db

Add a check that a created model can be loaded after creating (testin…

7f02d8c

…g save load combi)

Add hashed test again and remove outdated tests.

d3b3e52

Remove outdated tests

407ffc0

Remove unused variable

c65c203

Change plot_rmse_per_bin.py to plot_loss_per_bin.py

a28b96a

Speed up get_average_loss_per_bin and integrate with plot_average_per…

2e09a50

…_bin.py

Add output type hints

9bf710b

Make get_average_per_bin a public method

779c911

Change label on loss per bin plot

e210b1e

Solve deprecation warning for groupby(axis=1)

62cbec3

minor linting

4afe6b6

florian-huber reviewed Oct 28, 2024

View reviewed changes

ms2deepscore/validation_loss_calculation/PredictionsAndTanimotoScores.py Outdated Show resolved Hide resolved

florian-huber and others added 6 commits October 28, 2024 17:39

linting

dbd2bd0

linting & docstrings

e9b4bb8

fix mistake

e06f5ce

fixes

554c776

abs --> np.abs

94311f6

replace asserts by raises

b068337

florian-huber approved these changes Oct 28, 2024

View reviewed changes

switch to correct Error

9816ef0

niekdejonge merged commit 052bab6 into main Oct 28, 2024
10 checks passed

niekdejonge deleted the update_plotting_functions branch October 28, 2024 17:43

This was referenced Oct 29, 2024

Fix Issue 246 #247

Closed

fingerprint bits default cannot be changed #246

Closed

Minor code linting #250

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update plotting functions #244

Update plotting functions #244

niekdejonge commented Oct 18, 2024 •

edited

Loading

Update plotting functions #244

Update plotting functions #244

Conversation

niekdejonge commented Oct 18, 2024 • edited Loading

niekdejonge commented Oct 18, 2024 •

edited

Loading