Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update plotting functions #244

Merged
merged 90 commits into from
Oct 28, 2024
Merged

Update plotting functions #244

merged 90 commits into from
Oct 28, 2024

Conversation

niekdejonge
Copy link
Collaborator

@niekdejonge niekdejonge commented Oct 18, 2024

I started with adding functions for creating the heatmap plots and average prediction per bin plot, but resulted in a large refactoring of the code.

Before we had multiple places in the code where predictions were made, tanimoto scores calculated and losses calculated. These steps are often complex/confusing, since they both have to be the average per inchikey pair (instead of spectrum pair) and the average per bin. Having many different places were these calculations were made and different formats required for plotting, made the code hard to interpret.

This PR is an attempt to reformat the code to make it more harmonized.

@florian-huber
I still had the following questions/choices made that I would like to have your input on:

  • I did not implement the weighting function for the validation loss calculation. This can easily be done, but it wasn't really clear to me what the purpose of this was. If it is needed, we can add it. If we don't intend to use it anymore, we could also remove it for the training loss calculations.
  • I did remove all code not used in the main training pipeline. Mostly plotting functions that we likely won't use for this paper and will remove from the default plots created. I removed them, since the input required is a bit different than we use for other plots, which could result in using these plotting functions in the wrong way. If we later want to reuse these plotting functions they are still available on older versions on git.
  • I did not include the both against both comparison plot. It is also easy to still add, but if we want to add this I think we should think about how we want to implement this. Since now there was no correction done for the number of pos or neg mode spectra available. And spectra from pos and neg with the same inchikeys would still be merged.
  • The loss functions during running and loss functions during training are still implemented twice. During training torch is used and during validation pandas is used. Beside this difference, during validation we need to do the step of averaging over the multiple spectrum pairs per inchikey pair. Which we don't want to do during training, which means the mean has to be calculated outside of these methods.

…dTanimotoScores and CalculateScoresBetweenAllIonmodes.py
@niekdejonge niekdejonge merged commit 052bab6 into main Oct 28, 2024
10 checks passed
@niekdejonge niekdejonge deleted the update_plotting_functions branch October 28, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants