You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to train a model using the MPtraj dataset but I'm having trouble fitting this dataset into the framework. Can you guide me on how to proceed?
Regarding the dataset split for training and validation: Is the split random, or is there an official recommended method for splitting? What is the ratio between training and validation sets?
Regarding the MPTraj dataset format: The MPTraj dataset is in JSON format. To adapt it to the fair-chem framework, it should be converted to ASE or LMDB format, could you provide related conversion code?
Thank you very much.
The text was updated successfully, but these errors were encountered:
Hi @lavenderwfy, thanks for your question we are hoping to add more examples in the near future of how to write aselmdbs. For now, I'll include some pseudocode that will hopefully be useful. It sounds like you could use the code @CompRhys mentioned to generate the list of atoms objects.
from fairchem.core.datasets import LMDBDatabase
# convert JSON into a list of ase atoms objects
atoms_list = get_atoms_list_from_json(json_file)
# write atoms to the lmdb
output_file = "your_database.lmdb"
with LMDBDatabase(output_file) as db:
for atoms in atoms_list:
db.write(atoms, data=atoms.info)
lmdbs written in this way can be used for training/validation/testing in our repo e.g. you would replace this line in the config with the path to the file/folder of the train lmdbs you write. If you want to sanity check your lmdb you can easily read it using the code below.
from fairchem.core.datasets import AseDBDataset
dataset = AseDBDataset({"src": "path_to_your_database.lmdb"}) # path can also point to a folder with multiple lmdb files
dataset.get_atoms(0) # returns the first atoms object in the database
Hi guys,
I want to train a model using the MPtraj dataset but I'm having trouble fitting this dataset into the framework. Can you guide me on how to proceed?
Regarding the dataset split for training and validation: Is the split random, or is there an official recommended method for splitting? What is the ratio between training and validation sets?
Regarding the MPTraj dataset format: The MPTraj dataset is in JSON format. To adapt it to the fair-chem framework, it should be converted to ASE or LMDB format, could you provide related conversion code?
Thank you very much.
The text was updated successfully, but these errors were encountered: