-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue for preprocessing of OMAT24 dataset #945
Comments
Hi @Seunghyo-Noh 👋 |
Thank you for your comment. In the case of answer of 2nd question, Can I use MPTrj files without re-calculation via VASP? |
Both MPTrj and Alexandria (including our downsampled version sAlex) are fully compatible between them. That is correct, the OMat24 calculations use have some important differences to those in MPTrj (for example some of the pseudopotentials that were updated by Materials Project after the snapshot for MPTrj was taken). What are you trying to obtain from the different datasets? |
Thank you for your comment. Fairchem has demonstrated very successful predictions in energy, force, and stress, but it seems to have relatively high computational costs even in the s-model. While there may be some trade-offs in performance, I aim to develop a lightweight model by utilizing various datasets. If possible, I would like to include a version of the MPTrj dataset that is compatible with OMat24 in the training set, which is why I reached out to inquire about this. |
In that case I recommend following a similar approach as listed in the OMat24 manuscript.
If you want to train everything in a single step in a fully compatible manner, you will need to recalculate DFT or adapt your model architecture to handle different DFT settings, i.e. see this work or similar work on multi-fidelity models. |
@lbluque Thank you for providing the good paper link. I will refer to it and proceed accordingly! Your comments so far have been very helpful. |
What would you like to report?
The paper states the following:
'Next, we reduced the size of the dataset by removing all structures with energies > 0 eV, forces norm > 50 eV/Å, and stress > 80 GPa.'"
I am curious whether the 'forces norm' refers to the L1 norm or the L2 norm. Based on the context of the paper, it seems to be the L2 norm, but I would like to confirm this as it is not clear. Similarly, for the 'stress' case (3x3 symmetric matrix), I would like to know if the > 80 GPa refers to the norm or the maximum value.
The text was updated successfully, but these errors were encountered: