Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum number of molecules #3

Open
GemmaTuron opened this issue Apr 8, 2022 · 3 comments
Open

Minimum number of molecules #3

GemmaTuron opened this issue Apr 8, 2022 · 3 comments

Comments

@GemmaTuron
Copy link
Member

Is your feature request related to a problem? Please describe.
Zairachem cannot run with about less than 60 molecules

Describe the solution you'd like
Remove some steps for small datasets.

Describe alternatives you've considered
Add a requirement of the minimum number of molecules to train a Zairachem model.

@miquelduranfrigola
Copy link
Member

OK, this should clearly be a parameter. How many cases do we have, at the moment, that are affected by this constrain?

@miquelduranfrigola
Copy link
Member

A while ago I started to work on this problem, addressing it with data augmentation. There are a few tools already implemented in ZairaChem that allow us to do data augmentation, but I haven't incorporated them in the pipeline yet. You can find them in the augmentation folder.

Overall, I would be happy to explore this possibility, as I think it can be a key aspect of our tool.

@JHlozek
Copy link
Collaborator

JHlozek commented Apr 8, 2022

I did some testing as to the minimum number of molecules needed to be predictive for a new chemical series. It crashed for 10 and 30 molecules (log attached) and worked for a size of 60 and up.

30_train_log.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants