Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Support to Read OMAT24 Dataset Files #918

Open
VuDucMinh2908 opened this issue Nov 18, 2024 · 4 comments
Open

Request for Support to Read OMAT24 Dataset Files #918

VuDucMinh2908 opened this issue Nov 18, 2024 · 4 comments
Labels

Comments

@VuDucMinh2908
Copy link

VuDucMinh2908 commented Nov 18, 2024

Dear Support Team,

I am working with the OMat24 dataset and encountering issues while trying to read .aselmdb files from this dataset. Below are the issues I am facing and the questions I need assistance with:

Issue:
The OMat24 dataset is provided as .aselmdb files, and I am attempting to use ASE (Atomic Simulation Environment) or FairChem Core to retrieve data from these files.
However, when I try to open the .aselmdb file in ASE, I receive the error Unknown database type: aselmdb because ASE does not recognize this format.
When using FairChem Core, I could not find any module supporting the .aselmdb format and encountered ModuleNotFoundError.

Questions:
How can I open and read .aselmdb files: Is there a way to use ASE or FairChem Core to read these .aselmdb files? I need to access material structure data from these files.
Tools to read .aselmdb: If I cannot read .aselmdb files directly with ASE or FairChem, are there any other tools or libraries I can use to extract the data from these files?
Format conversion: If ASE or FairChem Core do not support .aselmdb, is there a way to convert .aselmdb files into a format that ASE or FairChem can read, such as .db or .xyz?
Code Sample I Have Tried:

python
from ase.db import connect

Path to the .aselmdb file

file_path = r"C:\Users\vu duc minh\Downloads\aimd-from-PBE-3000-npt\val.aselmdb"
db = connect(file_path)
Request for Support:

I would appreciate receiving detailed instructions or any documentation on how to read and access data from the .aselmdb files in the OMat24 dataset.
If there are tools or methods to convert the file format or if there are alternative APIs I can use, I would be grateful for that information.
Thank you for your support!

@zulissimeta
Copy link
Collaborator

Hi - you can use the ase lmdb class like so

from fairchem.core.datasets.lmdb_database import LMDBDatabase
with LMDBDatabase(file_path) as connect:
   # db access here, connect works just like a normal ASE db connection

We use this format because it works just like a normal ASE db, but is much faster in our testing for random IO during large model training, as well as reading/writing.

@VuDucMinh2908
Copy link
Author

Can you guide me step by step in detail to be able to read OMAT24 files, including which libraries need to be installed and how to code them?

@misko
Copy link
Collaborator

misko commented Nov 19, 2024

Hi!
Please follow instructions from here to install farichem-core
To make sure the installation works, please run the following python code,
from fairchem.core.datasets.lmdb_database import LMDBDatabase

If this errors please create a new environment and try a fresh installation of fairchem-core, if the same error still persists please post it here.

Next download and ungzip the files you would like to access, this should result in a folder with many files named something like,

aimd-from-PBE-3000-npt/metadata_db_225.npz
aimd-from-PBE-3000-npt/metadata_db_469.npz
aimd-from-PBE-3000-npt/db_200.aselmdb-lock
aimd-from-PBE-3000-npt/db_200.aselmdb
aimd-from-PBE-3000-npt/db_159.aselmdb-lock
aimd-from-PBE-3000-npt/db_159.aselmdb
aimd-from-PBE-3000-npt/db_434.aselmdb-lock
aimd-from-PBE-3000-npt/db_434.aselmdb

Now we will access all the records in one of the aselmdb files using the following, please use the correct file path for you system (windows),

from fairchem.core.datasets.lmdb_database import LMDBDatabase
file_path='aimd-from-PBE-3000-npt/db_159.aselmdb'
with LMDBDatabase(file_path) as db:
  print("ID available in this file", db.ids[:3],"...",db.ids[-3:])
  print(f"Atom object at {db.ids[0]} is {db.get(db.ids[0])}")

The output should look like this,

$ python test.py 
ID available in this file [1, 2, 3] ... [11117, 11118, 11119]
Atom object at 1 is <AtomsRow: formula=Os18H18I36, keys=>

Hope this helps!

Copy link

This issue has been marked as stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants