Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train a model on QM9 dataset? #914

Open
Jevon-Du opened this issue Nov 15, 2024 · 3 comments
Open

How to train a model on QM9 dataset? #914

Jevon-Du opened this issue Nov 15, 2024 · 3 comments

Comments

@Jevon-Du
Copy link

Jevon-Du commented Nov 15, 2024

Hi guys,

I want to train a model to predict HOMO energy using the QM9 dataset but I'm having trouble finding relevant documentation. Can you guide me on how to proceed?

@misko
Copy link
Collaborator

misko commented Nov 19, 2024

Hi!,
I don't have a detailed answer for you, but I think a rough approach might be as follows,

  1. Convert QM9 into a usable data format (ASEdb or ASELMDB)
  2. Add an output for HOMO energy , or easier, just use the already existing energy scalar to represent HOMO energy
  3. Run training

For (1) there are two existing issues in the repo which we have not solved yet that might provide some insight, #788 , #787 , https://fair-chem.github.io/core/ase_dataset_creation.html

For (3) I think this might be a good start. Depending on what data format you use you can ignore parts relating to LMDB. To use ASEdb format you change the config slightly to specify format: ase_db (as is done in the example linked)

Hope this helps! If you make some progress and get stuck please reach out here, I would be happy to help! if you are interested we could also use your approach in the tutorial, and save other people asking the same question some time :)

@itsuwari
Copy link

I have done some fine-tuning on QM9 dataset, found no distinct advantage over GFN-xTB2 and other fine-tuned models. It could be EqV2 was trained for periodic systems and it's not the best choice for representing the interaction among light atoms.

@lavenderwfy
Copy link

I have done some fine-tuning on QM9 dataset, found no distinct advantage over GFN-xTB2 and other fine-tuned models. It could be EqV2 was trained for periodic systems and it's not the best choice for representing the interaction among light atoms.

Hi there! I'm currently engaged in similar work as well. When it comes to the task of completing data format conversion, specifically converting the QM9 dataset into a usable data format like the ASEdb or ASELMDB, I was wondering if you could be so kind as to share your relevant code with me? It would be of great help to my work, and I really appreciate it in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants