Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to initiate OCPCalculator using the eqV2_dens_31M_mp.pt checkpoint #941

Open
jinlhr542 opened this issue Dec 11, 2024 · 7 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@jinlhr542
Copy link

Python version

3.11

fairchem-core version

1.3.0

pytorch version

2.5.1

cuda version

used cpu

Operating system version

No response

Minimal example

No response

Current behavior

from fairchem.core import OCPCalculator
from pymatgen.core.structure import Structure
lattice = [[5.0, 0.0, 0.0], [0.0, 5.0, 0.0], [0.0, 0.0, 5.0]]
atoms = [("Si", [0, 0, 0]), ("Si", [1.5, 1.5, 1.5])]
atoms = Structure(lattice, [atom[0] for atom in atoms], [atom[1] for atom in atoms]).to_ase_atoms()
calc = OCPCalculator(checkpoint_path="eqV2_dens_31M_mp.pt", local_cache="pretrained_models", cpu = True)

TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 calc = OCPCalculator(checkpoint_path="eqV2_dens_31M_mp.pt", local_cache="pretrained_models", cpu = True)
2 #atoms.calc = calc

File /opt/anaconda3/envs/3.11/lib/python3.11/site-packages/fairchem/core/common/relaxation/ase_utils.py:212, in OCPCalculator.init(self, config_yml, checkpoint_path, model_name, local_cache, trainer, cpu, seed)
209 self.config["checkpoint"] = str(checkpoint_path)
210 del config["dataset"]["src"]
--> 212 self.trainer = registry.get_trainer_class(config["trainer"])(
213 task=config.get("task", {}),
214 model=config["model"],
215 dataset=[config["dataset"]],
216 outputs=config["outputs"],
217 loss_functions=config["loss_functions"],
218 evaluation_metrics=config["evaluation_metrics"],
219 optimizer=config["optim"],
220 identifier="",
221 slurm=config.get("slurm", {}),
222 local_rank=config.get("local_rank", 0),
223 is_debug=config.get("is_debug", True),
224 cpu=cpu,
225 amp=config.get("amp", False),
226 inference_only=True,
227 )
229 if checkpoint_path is not None:
230 self.load_checkpoint(checkpoint_path=checkpoint_path, checkpoint=checkpoint)

File /opt/anaconda3/envs/3.11/lib/python3.11/site-packages/fairchem/core/trainers/ocp_trainer.py:109, in OCPTrainer.init(self, task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier, local_rank, timestamp_id, run_dir, is_debug, print_every, seed, logger, amp, cpu, name, slurm, gp_gpus, inference_only)
107 if slurm is None:
108 slurm = {}
--> 109 super().init(
110 task=task,
111 model=model,
112 outputs=outputs,
113 dataset=dataset,
114 optimizer=optimizer,
115 loss_functions=loss_functions,
116 evaluation_metrics=evaluation_metrics,
117 identifier=identifier,
118 local_rank=local_rank,
119 timestamp_id=timestamp_id,
120 run_dir=run_dir,
121 is_debug=is_debug,
122 print_every=print_every,
123 seed=seed,
124 logger=logger,
125 amp=amp,
126 cpu=cpu,
127 slurm=slurm,
128 name=name,
129 gp_gpus=gp_gpus,
130 inference_only=inference_only,
131 )

File /opt/anaconda3/envs/3.11/lib/python3.11/site-packages/fairchem/core/trainers/base_trainer.py:220, in BaseTrainer.init(self, task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier, local_rank, timestamp_id, run_dir, is_debug, print_every, seed, logger, amp, cpu, name, slurm, gp_gpus, inference_only)
217 self.primary_metric = None
218 self.ema = None
--> 220 self.load(inference_only)

File /opt/anaconda3/envs/3.11/lib/python3.11/site-packages/fairchem/core/trainers/base_trainer.py:243, in BaseTrainer.load(self, inference_only)
241 self.load_logger()
242 self.load_task()
--> 243 self.load_model()
245 if inference_only is False:
246 self.load_datasets()

File /opt/anaconda3/envs/3.11/lib/python3.11/site-packages/fairchem/core/trainers/base_trainer.py:558, in BaseTrainer.load_model(self)
556 model_config_copy = copy.deepcopy(self.config["model"])
557 model_name = model_config_copy.pop("name")
--> 558 self.model = registry.get_model_class(model_name)(
559 **model_config_copy,
560 ).to(self.device)
562 num_params = sum(p.numel() for p in self.model.parameters())
564 if distutils.is_master():

File /opt/anaconda3/envs/3.11/lib/python3.11/site-packages/fairchem/core/models/base.py:267, in HydraModel.init(self, backbone, heads, finetune_config, otf_graph, pass_through_head_outputs, freeze_backbone)
265 backbone = copy.deepcopy(backbone)
266 backbone_model_name = backbone.pop("model")
--> 267 self.backbone: BackboneInterface = registry.get_model_class(
268 backbone_model_name
269 )(
270 **backbone,
271 )
272 elif starting_model is not None:
273 self.backbone = starting_model.backbone

TypeError: EqV2DeNSBackbone.init() got an unexpected keyword argument 'use_denoising_energy'

Expected Behavior

https://huggingface.co/fairchem/OMAT24 suggests to use fairchem-core 1.2.1 version. However this has been decrepted.

Relevant files to reproduce this bug

No response

@jinlhr542 jinlhr542 added the bug Something isn't working label Dec 11, 2024
@lbluque
Copy link
Collaborator

lbluque commented Dec 13, 2024

Hi @jinlhr542,

Thanks for reporting this! The checkpoint should work with the omat24 branch, in case you want to use it now.

I will push updated checkpoints that work with the newest version (1.3.0) of FAIRChem too.

@jinlhr542
Copy link
Author

Hi @jinlhr542,

Thanks for reporting this! The checkpoint should work with the omat24 branch, in case you want to use it now.

I will push updated checkpoints that work with the newest version (1.3.0) of FAIRChem too.

Thank you

@jinlhr542
Copy link
Author

jinlhr542 commented Dec 16, 2024

Hi @jinlhr542,

Thanks for reporting this! The checkpoint should work with the omat24 branch, in case you want to use it now.

I will push updated checkpoints that work with the newest version (1.3.0) of FAIRChem too.

Dear Luis

I failed to create the conda environment using the env.cpu.yml file in the omat24 branch:

(3.12) jason@YaoshuXiedeMacBook-Pro packages % conda env create -f env.cpu.yml
Channels:

pytorch
conda-forge
defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done
Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: - Ran pip subprocess with arguments:
['/opt/anaconda3/envs/fair-chem/bin/python', '-m', 'pip', 'install', '-U', '-r', '/Users/jason/Documents/GitHub/mlipdockers/docker_image_build/eqV2/fairchem-omat24/packages/condaenv.k5z_6me_.requirements.txt', '--exists-action=b']
Pip subprocess output:
Looking in links: https://data.pyg.org/whl/torch-2.4.0+cu121.html

Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement torch_cluster==1.6.3+pt24cpu (from versions: 0.1.1, 0.2.3, 0.2.4, 1.0.1, 1.0.3, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.3.0, 1.4.0, 1.4.1, 1.4.2, 1.4.3a1, 1.4.3, 1.4.4, 1.4.5, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.6.0, 1.6.1, 1.6.2, 1.6.3)
ERROR: No matching distribution found for torch_cluster==1.6.3+pt24cpu

failed

CondaEnvException: Pip failed

@lbluque lbluque self-assigned this Dec 16, 2024
@lbluque
Copy link
Collaborator

lbluque commented Dec 17, 2024

Hi @jinlhr542,

Did you run into this issue before? or only when trying to install a version to use the omat24 branch?

In any case, I have just pushed checkpoints that are compatible with fairchem-core >= 1.3.0, so there is no need to use the omat24 branch anymore. Could you try re-dowloading the checkpoint and running your original code snippet?

@jinlhr542
Copy link
Author

Hi @jinlhr542,

Did you run into this issue before? or only when trying to install a version to use the omat24 branch?

In any case, I have just pushed checkpoints that are compatible with fairchem-core >= 1.3.0, so there is no need to use the omat24 branch anymore. Could you try re-dowloading the checkpoint and running your original code snippet?

I have tried to install the current 1.3.0 fairchem-core following the instruction:
wget https://raw.githubusercontent.com/FAIR-Chem/fairchem/main/packages/env.cpu.yml
conda env create -f env.cpu.yml,
and get the error:

Channels:
 - pytorch
 - conda-forge
 - defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done

Downloading and Extracting Packages:
                                                                                
Preparing transaction: done                                                     
Verifying transaction: done                                                     
Executing transaction: done                                                     
Installing pip dependencies: \ Ran pip subprocess with arguments:               
['/opt/anaconda3/envs/fair-chem/bin/python', '-m', 'pip', 'install', '-U', '-r', '/Users/jason/condaenv.j_3tkbpv.requirements.txt', '--exists-action=b']        
Pip subprocess output:
Looking in links: https://data.pyg.org/whl/torch-2.4.0+cu121.html

Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement torch_cluster==1.6.3+pt24cpu (from versions: 0.1.1, 0.2.3, 0.2.4, 1.0.1, 1.0.3, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.3.0, 1.4.0, 1.4.1, 1.4.2, 1.4.3a1, 1.4.3, 1.4.4, 1.4.5, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.6.0, 1.6.1, 1.6.2, 1.6.3)
ERROR: No matching distribution found for torch_cluster==1.6.3+pt24cpu

failed

CondaEnvException: Pip failed

I am afraid that there are some errors in the env.cpu.yml. Could you please check it?

@jinlhr542
Copy link
Author

jinlhr542 commented Dec 20, 2024

Hi @jinlhr542,

Did you run into this issue before? or only when trying to install a version to use the omat24 branch?

In any case, I have just pushed checkpoints that are compatible with fairchem-core >= 1.3.0, so there is no need to use the omat24 branch anymore. Could you try re-dowloading the checkpoint and running your original code snippet?

Dear Luis

I have successfully installed the 1.3.0 version using a docker container. As you mentioned that 1.3.0 should support eqV2 Dense model, I tried OCPCalculator(checkpoint_path="eqV2_dens_31M_mp.pt") and get the error:

2024-12-20 15:09:20 Traceback (most recent call last):
2024-12-20 15:09:20   File "/opt/conda/lib/python3.11/site-packages/flask/app.py", line 1511, in wsgi_app
2024-12-20 15:09:20     response = self.full_dispatch_request()
2024-12-20 15:09:20                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-12-20 15:09:20   File "/opt/conda/lib/python3.11/site-packages/flask/app.py", line 919, in full_dispatch_request
2024-12-20 15:09:20     rv = self.handle_user_exception(e)
2024-12-20 15:09:20          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-12-20 15:09:20   File "/opt/conda/lib/python3.11/site-packages/flask/app.py", line 917, in full_dispatch_request
2024-12-20 15:09:20     rv = self.dispatch_request()
2024-12-20 15:09:20          ^^^^^^^^^^^^^^^^^^^^^^^
2024-12-20 15:09:20   File "/opt/conda/lib/python3.11/site-packages/flask/app.py", line 902, in dispatch_request
2024-12-20 15:09:20     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
2024-12-20 15:09:20            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-12-20 15:09:20   File "/app/server.py", line 12, in predict_energy
2024-12-20 15:09:20     calc = OCPCalculator(checkpoint_path="eqV2_dens_31M_mp.pt")
2024-12-20 15:09:20            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-12-20 15:09:20   File "/opt/conda/lib/python3.11/site-packages/fairchem/core/common/relaxation/ase_utils.py", line 227, in __init__
2024-12-20 15:09:20     self.trainer = registry.get_trainer_class(config["trainer"])(
2024-12-20 15:09:20                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-12-20 15:09:20   File "/opt/conda/lib/python3.11/site-packages/fairchem/core/trainers/ocp_trainer.py", line 109, in __init__
2024-12-20 15:09:20     super().__init__(
2024-12-20 15:09:20   File "/opt/conda/lib/python3.11/site-packages/fairchem/core/trainers/base_trainer.py", line 220, in __init__
2024-12-20 15:09:20     self.load(inference_only)
2024-12-20 15:09:20   File "/opt/conda/lib/python3.11/site-packages/fairchem/core/trainers/base_trainer.py", line 243, in load
2024-12-20 15:09:20     self.load_model()
2024-12-20 15:09:20   File "/opt/conda/lib/python3.11/site-packages/fairchem/core/trainers/base_trainer.py", line 561, in load_model
2024-12-20 15:09:20     self.model = registry.get_model_class(model_name)(
2024-12-20 15:09:20                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-12-20 15:09:20   File "/opt/conda/lib/python3.11/site-packages/fairchem/core/models/base.py", line 267, in __init__
2024-12-20 15:09:20     self.backbone: BackboneInterface = registry.get_model_class(
2024-12-20 15:09:20                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-12-20 15:09:20 TypeError: EqV2DeNSBackbone.__init__() got an unexpected keyword argument 'use_denoising_energy'

I have checked the source code of the EqV2DeNSBackbone class, indeed this class init function does not have a 'use_denoising_energy' argument.

Then, I used another checkpoint eqV2_31M_omat_mp_salex.pt, and it suceeded without getting error. Therefore, I am afraid that there are some arguments settings for the eqV2_dens_31M_mp.pt eqV2_dens_86M_mp.pt eqV2_dens_153M_mp.pt which are not compatible with the EqV2DeNSBackbone.init() function.

@lbluque
Copy link
Collaborator

lbluque commented Dec 20, 2024

Hi @jinlhr542 did you download the checkpoints recently? I uploaded compatible checkpoints earlier this week.

I double checked on my end and can load the DeNS checkpoints without issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants