Model testing #700

GemmaTuron · 2023-06-13T15:49:11Z

GemmaTuron
Jun 13, 2023
Maintainer

We need to better define what we mean by "Test a model" and provide example files etc, and a checklist so that contributors can follow step by step.
What testing would you like to see for each model?

Answered by miquelduranfrigola

Jun 21, 2023

Thanks @emmakodes @emmakodes @HellenNamulinda @ZakiaYahya, this is very valuable, let me add a few extra thoughts:

Thoughts

I have written a placeholder at ersilia.publish.test, named LocalModelTester that should have multiple methods to test models before doing the PR. In particular, one could potentially evaluate the following:

The metadata: we need to make sure that the information for the model is complete enough. This can be effectively done with a BaseInformation class, which can read the metadata.json file and will raise exceptions if something does not look good (for example, a wrong URL, or a description that is too short).
Input-output consistency: this is basically what you …

View full answer

ZakiaYahya · 2023-06-16T07:34:05Z

ZakiaYahya
Jun 16, 2023

Hi @GemmaTuron

(1) Firstly, I think we should define a file like ersilia did for eml_canonical.csv, there should be a file named e.g. test_smiles.csv that contains atleast 10-20 smiles. So everyone use that smiles for testing.

(2) It's better to put one wrong smile string at the end of smiles to confirm the behaviour of model whether it is predicting right or wrong ( but i don't know how ersilia model behaves when a wrong input passes to it), May be for this we define some checks while reading smiles from the file test_smiles.csv that if wrong input pass, ignore it and put NAN in output.

4 replies

GemmaTuron Jun 16, 2023
Maintainer Author

yes, that is a good suggestion.
Perhaps we should have two csv files, one with all correct (s0 we first check that the modle works) and then one with some incorrect ones

ZakiaYahya Jun 16, 2023

Yes, we can do this as well, testing two files but it will time taking then, i mean for one model testing, you have to run model two times on CLI, two times on Colab and two times on DockerHub at prediction time, Right @GemmaTuron ??

GemmaTuron Jun 16, 2023
Maintainer Author

yes, you need to double the tests, but if you pass an incorrect smiles from the beginning it might be the error is somewhere else and you are not seeing it, so I'd go for two files

ZakiaYahya Jun 16, 2023

Oh right, yeh for this we need to double test the files.

GemmaTuron · 2023-06-20T13:39:12Z

GemmaTuron
Jun 20, 2023
Maintainer Author

Hi @emmakodes, @samuelmaina @HellenNamulinda

Please give your views on this issue as well.

0 replies

HellenNamulinda · 2023-06-20T13:57:15Z

HellenNamulinda
Jun 20, 2023

This is late,
But from my experiences so far, Besides having separate files for correct and incorrect smiles,
I think the testing pipeline should be as follows;

Test on Colab. It's unlikely that one will make predictions for string inputs using Colab, but a file. Though the model will pass a list at run time. Since some models are big and slow at inference time, it would be better to have a file with a few molecules.
Test using Ersilia CLI. Here, the model should be tested for two types of inputs.
- String inputs (like two molecules), and
- file inputs. (for testing, 20 molecules per file should be okay)
For docker, it may take some time to pull the image. though should be tested with both inputs.

I'm insisting on two inputs for CLI because I have tested models that only work when given string inputs.

0 replies

emmakodes · 2023-06-20T14:31:25Z

emmakodes
Jun 20, 2023

Hello @everyone, having two files to test a model is fair enough (one contains correct smiles and the other file include some incorrect smiles) but some models are pretty large and takes time. What I may suggest is we still have the two files but the number of smiles in each of the files can be reduced to say 5. These five smiles in each of the two files should be the standard and most common input to use test a model.

0 replies

miquelduranfrigola · 2023-06-21T06:26:21Z

miquelduranfrigola
Jun 21, 2023
Maintainer

0 replies

samuelmaina · 2023-06-21T08:40:59Z

samuelmaina
Jun 21, 2023

Late to the party, sorry . Everyone has said pretty much what I would suggest. Thanks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model testing #700

{{title}}

Replies: 6 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Model testing #700

GemmaTuron Jun 13, 2023 Maintainer

Thoughts

Replies: 6 comments · 4 replies

ZakiaYahya Jun 16, 2023

GemmaTuron Jun 16, 2023 Maintainer Author

ZakiaYahya Jun 16, 2023

GemmaTuron Jun 16, 2023 Maintainer Author

ZakiaYahya Jun 16, 2023

GemmaTuron Jun 20, 2023 Maintainer Author

HellenNamulinda Jun 20, 2023

emmakodes Jun 20, 2023

miquelduranfrigola Jun 21, 2023 Maintainer

Thoughts

Strategy

samuelmaina Jun 21, 2023

GemmaTuron
Jun 13, 2023
Maintainer

Replies: 6 comments 4 replies

ZakiaYahya
Jun 16, 2023

GemmaTuron Jun 16, 2023
Maintainer Author

GemmaTuron Jun 16, 2023
Maintainer Author

GemmaTuron
Jun 20, 2023
Maintainer Author

HellenNamulinda
Jun 20, 2023

emmakodes
Jun 20, 2023

miquelduranfrigola
Jun 21, 2023
Maintainer

samuelmaina
Jun 21, 2023