-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training the model on a single CPU or GPU #944
Comments
Hello @lbluque . Also, I would like to know how to run the model on a single CPU. The result of the 'python - u main. py -- mode train -- config yml configs/oc22/is2re/gemnet-dT/gemnet-dT.yml -- CPU ' instruction is the same as above. Thank you again for your reply. |
@csu-wjc are you getting an error when running this? If so can you paste the traceback or more information to understand the issue? |
Hello @lbluque . (fair-chem) PS D:\Desktop\fairchem> python main.py --mode train --config-yml configs/oc22/is2re/gemnet-dT/gemnet-dT.yml --cpu
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. These are all the feedback on how the code is running, the problem is that I can't see how it's training or if it's training at all. |
Training errors should be printed to the console when training, Can you try running it with the You can also set logging to WandB to see training and validation curves by setting the following in the config file (you will need to set up and login to WandB): logger:
name: wandb |
Hello @lbluque . (fair-chem) PS D:\Desktop\fairchem> python main.py --mode train --config-yml configs/oc22/is2re/gemnet-dT/gemnet-dT.yml --cpu --debug
2024-12-18 19:42:12 (INFO): Loading model: gemnet_t Also, may I ask if there is any way for me to see the training curve while training on a single CPU (without logging into Wanbd). |
What would you like to report?
When I run the model on a single cpu using code: 'ython main.py --mode train --config-yml configs/oc22/is2re/painn/painn.yml ', it gives the following error :
(WARNING): Could not find dataset metadata.npz files in '[WindowsPath('D:/Miniconda/envs/fair-chem/Lib/site-packages/fairchem/data/oc22/is2re-total/train')]'
(WARNING): Disabled BalancedBatchSampler because num_replicas=1.
[rank0]: Traceback (most recent call last):
[rank0]: File "D:\Desktop\fairchem\main.py", line 10, in
[rank0]: main()
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core_cli.py", line 135, in main
[rank0]: runner_wrapper(config)
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core_cli.py", line 58, in runner_wrapper
[rank0]: Runner()(config)
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core_cli.py", line 37, in call
[rank0]: with new_trainer_context(config=config) as ctx:
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\contextlib.py", line 137, in enter
[rank0]: return next(self.gen)
[rank0]: ^^^^^^^^^^^^^^
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core\common\utils.py", line 1102, in new_trainer_context
[rank0]: trainer = trainer_cls(**trainer_config)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core\trainers\ocp_trainer.py", line 109, in init
[rank0]: super().init(
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core\trainers\base_trainer.py", line 220, in init
[rank0]: self.load(inference_only)
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core\trainers\base_trainer.py", line 246, in load
[rank0]: self.load_datasets()
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core\trainers\base_trainer.py", line 365, in load_datasets
[rank0]: self.train_sampler = self.get_sampler(
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core\trainers\base_trainer.py", line 313, in get_sampler
[rank0]: return BalancedBatchSampler(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core\common\data_parallel.py", line 171, in init
[rank0]: raise error
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core\common\data_parallel.py", line 168, in init
[rank0]: dataset = _ensure_supported(dataset)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "D:\Miniconda\envs\fair-chem\Lib\site-packages\fairchem\core\common\data_parallel.py", line 113, in _ensure_supported
[rank0]: raise UnsupportedDatasetError(
[rank0]: fairchem.core.datasets.base_dataset.UnsupportedDatasetError: BalancedBatchSampler requires a dataset that has a metadata attributed with number of atoms.
The text was updated successfully, but these errors were encountered: