You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/conda_dir/envs/fairchem-ocp-cpu/lib/python3.12/site-packages/fairchem/core/common/relaxation/ase_utils.py:190: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
WARNING:root:Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[90], line 2
1 # Load the pre-trained checkpoint!
----> 2 calc = OCPCalculator(checkpoint_path=checkpoint_path, cpu=True)
3 slab.set_calculator(calc)
File /conda_dir/envs/fairchem-ocp-cpu/lib/python3.12/site-packages/fairchem/core/common/relaxation/ase_utils.py:212, in OCPCalculator.__init__(self, config_yml, checkpoint_path, model_name, local_cache, trainer, cpu, seed)
209 self.config["checkpoint"] = str(checkpoint_path)
210 del config["dataset"]["src"]
--> 212 self.trainer = registry.get_trainer_class(config["trainer"])(
213 task=config.get("task", {}),
214 model=config["model"],
215 dataset=[config["dataset"]],
216 outputs=config["outputs"],
217 loss_functions=config["loss_functions"],
218 evaluation_metrics=config["evaluation_metrics"],
219 optimizer=config["optim"],
220 identifier="",
221 slurm=config.get("slurm", {}),
222 local_rank=config.get("local_rank", 0),
223 is_debug=config.get("is_debug", True),
224 cpu=cpu,
225 amp=config.get("amp", False),
226 inference_only=True,
227 )
229 if checkpoint_path is not None:
230 self.load_checkpoint(checkpoint_path=checkpoint_path, checkpoint=checkpoint)
File /conda_dir/envs/fairchem-ocp-cpu/lib/python3.12/site-packages/fairchem/core/common/registry.py:302, in Registry.get_trainer_class(cls, name)
300 @classmethod
301 def get_trainer_class(cls, name: str):
--> 302 return cls.get_class(name, "trainer_name_mapping")
File /conda_dir/envs/fairchem-ocp-cpu/lib/python3.12/site-packages/fairchem/core/common/registry.py:273, in Registry.get_class(cls, name, mapping_name)
271 # mapping be class path of type `{module_name}.{class_name}` (e.g., `fairchem.core.trainers.ForcesTrainer`)
272 if name.count(".") < 1:
--> 273 raise cls.__import_error(name, mapping_name)
275 try:
276 return _get_absolute_mapping(name)
RuntimeError: Failed to find the trainer 'equiformerv2_forces'. You may either use a trainer from the registry (one of 'base', 'forces', 'energy' or 'ocp') or provide the full import path to the trainer (e.g., 'fairchem.core.trainers.ocp_trainer.OCPTrainer').
Expected Behavior
I expected EquiformerV2 models to be importable into the OCPCalculator and usable for inference (e.g., during BFGS optimization in ASE).
I am doing this on a compute cluster in which I don't have control over the CUDA or glibc versions. The cluster has only CUDA 11.6 available, and the required version of PyTorch (2.4.0) isn't available for CUDA 11.6, so I installed the CPU version of the Conda environment. The cluster's glibc version is only 2.28, and I was running into import errors mentioning the system libm.so that it needed at least glibc 2.29, so I installed OpenLibm 0.8.1 from the conda-forge channel and symlinked $CONDA_PREFIX/lib/libm.so to $CONDA_PREFIX/lib/libopenlibm.so.4.0. This resolved the libm.so/glibc 2.29 errors, but now I am running into the above-mentioned errors when trying to instantiate an OCPCalculator with one of the publicly available EquiformerV2 checkpoint files. I get the same error when trying to load any of the publicly available EquiformerV2 names:
Specifying trainer="equiformerv2_forces" in the OCPCalculator constructor leads to the same error above. Specifying trainer="forces" leads to this error:
RuntimeError: Failed to find the trainer 'equiformerv2_forces'. You may either use a trainer from the registry (one of 'base', 'forces', 'energy' or 'ocp') or provide the full import path to the trainer (e.g., 'fairchem.core.trainers.ocp_trainer.OCPTrainer').
If I specify trainer="equiformerv2_forces", model_name="equiformerv2", I get:
RuntimeError: model_name and checkpoint_path were both specified, please use only one at a time
I can verify in the local copy of fairchem/core/models/pretrained_models.yml that all the EquiformerV2* names appear. This is the latest 1.3.0 release from last week. Not sure what I'm doing wrong.
Relevant files to reproduce this bug
No response
The text was updated successfully, but these errors were encountered:
I did add fairchem-data-oc to the Conda environment using Pip. I'm now again seeing the previous errors about pyg-lib not finding a libm.so compiled with glibc 2.29, and I think the outdated glibc might be responsible for the problem. I recreated the same Conda environment on NERSC Perlmutter login nodes (SLES 15, glibc 2.31), and EquiformerV2 models seem to work okay without needing fairdata-chem-oc. Possibly on my primary cluster the presence of the old libm.so/glibc, which prevents pyg-lib and torch_sparse from loading, ultimately makes the newer EquiformerV2 models unavailable in the model registry.
Thank you. I also encountered it before on my primary cluster, which had only glibc 2.27 and can't be updated. However, I deleted pyg-lib and got fair-chem working well. Until now, I haven't see problems. I thought it can be a tricky method? Is it OK? @misko
Python version
Python 3.12.8
fairchem-core version
1.3.0
pytorch version
2.4.0
cuda version
False None
Operating system version
RHEL 8.8 (kernel 4.18.0-477.10.1.el8_8.x86_64)
Minimal example
Current behavior
I get the following traceback:
Expected Behavior
I expected EquiformerV2 models to be importable into the
OCPCalculator
and usable for inference (e.g., during BFGS optimization in ASE).I am doing this on a compute cluster in which I don't have control over the CUDA or glibc versions. The cluster has only CUDA 11.6 available, and the required version of PyTorch (2.4.0) isn't available for CUDA 11.6, so I installed the CPU version of the Conda environment. The cluster's glibc version is only 2.28, and I was running into import errors mentioning the system libm.so that it needed at least glibc 2.29, so I installed OpenLibm 0.8.1 from the
conda-forge
channel and symlinked$CONDA_PREFIX/lib/libm.so
to$CONDA_PREFIX/lib/libopenlibm.so.4.0
. This resolved the libm.so/glibc 2.29 errors, but now I am running into the above-mentioned errors when trying to instantiate anOCPCalculator
with one of the publicly available EquiformerV2 checkpoint files. I get the same error when trying to load any of the publicly available EquiformerV2 names:Specifying
trainer="equiformerv2_forces"
in theOCPCalculator
constructor leads to the same error above. Specifyingtrainer="forces"
leads to this error:If I specify
trainer="equiformerv2_forces", model_name="equiformerv2",
I get:I can verify in the local copy of
fairchem/core/models/pretrained_models.yml
that all theEquiformerV2*
names appear. This is the latest 1.3.0 release from last week. Not sure what I'm doing wrong.Relevant files to reproduce this bug
No response
The text was updated successfully, but these errors were encountered: