Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poetry update the torch versioned from cuda (2.0.1+cu118) to cpu (2.1.1) defaultly on Windows #1145

Open
6 of 9 tasks
coolermzb3 opened this issue May 11, 2024 · 9 comments
Open
6 of 9 tasks

Comments

@coolermzb3
Copy link
Contributor

coolermzb3 commented May 11, 2024

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
    • design request (i.e. "X should be changed to Y.")
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, gymnasium as gym, torch, numpy, sys
    print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

got:

1.0.0 0.28.1 2.1.1+cpu 1.24.4 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)] win32

I recently noticed the major version update of Tianshou, so I did a git pull and used poetry install.

However, I found that it defaults to upgrading the GPU/CUDA supported torch (2.0.1+cu118) to the CPU-only version (2.1.1) on my Windows machine.

image

I also created a new empty conda environment and poetry install, still cpu version (2.1.1).

I'm new to poetry (as well as tianshou-1.0.0 ) and would like to ask if this is a feature or a bug. Is it possible to support the installation of CUDA-enabled torch on Windows by default?

I understand that the CUDA architecture may vary from machine to machine, which might pose some difficulties. In that case, could there be user-friendly prompts (or documentations) indicating that users should install the appropriate version of torch manually if necessary?

@MischaPanch
Copy link
Collaborator

@opcode81 could you look into it?

@opcode81
Copy link
Collaborator

opcode81 commented May 13, 2024

This is a well-known Poetry limitation. By default, installing torch via Poetry will use a torch build that was built against a default version of CUDA (it is not a CPU-only version); and the version of CUDA it uses depends on the torch version. For example, torch 2.0 might have used CUDA 11 and later versions might now use CUDA 12 by default.

So to get CUDA support in later torch versions on your system, you have the following uptions:

  1. Upgrade your system to use CUDA 12 (if possible) OR
  2. Install a torch build (of the same version) that works with CUDA 11 (or whatever CUDA version you may have)

When using Poetry,

When using conda to manage your env (often a better choice!), a clean way to do the latter is to use the channels pytorch and nvidia and to depend on pytorch-cuda=11 in addition to pytorch itself.

@MischaPanch
Copy link
Collaborator

Let's add some instructions to the readme, then close this issue. I can do that, or leave it to you, if you want :)

@coolermzb3
Copy link
Contributor Author

This is a well-known Poetry limitation. By default, installing torch via Poetry will use a torch build that was built against a default version of CUDA (it is not a CPU-only version); and the version of CUDA it uses depends on the torch version. For example, torch 2.0 might have used CUDA 11 and later versions might now use CUDA 12 by default.

I'm a bit confused about this. According to your explanation, a "default" version of PyTorch that supports CUDA should be installed. However, the result on my Windows shows that it installed a version that doesn't support CUDA, not just a difference between CUDA 11 or 12. But it doesn't matter, I'd prefer to install it manually.


Perhaps some modifications can be made in pyproject.toml according to this. (but this could potentially cause some inconvenience for some users who are experiencing slow network speeds when accessing https://download.pytorch.org/).


Anyway, I think it'd be a good idea to add some instructions in the readme, explaining the limitations of the default torch installation and suggesting (or reminding users of) the recommended manual installation methods like above.

Thank you~

@opcode81
Copy link
Collaborator

opcode81 commented May 16, 2024

This is a well-known Poetry limitation. By default, installing torch via Poetry will use a torch build that was built against a default version of CUDA (it is not a CPU-only version); and the version of CUDA it uses depends on the torch version. For example, torch 2.0 might have used CUDA 11 and later versions might now use CUDA 12 by default.

I'm a bit confused about this. According to your explanation, a "default" version of PyTorch that supports CUDA should be installed. However, the result on my Windows shows that it installed a version that doesn't support CUDA, not just a difference between CUDA 11 or 12.

No, the result on your Windows machine does not show this. The version that is designated as purely "2.1.1" does support CUDA, just not necessarily your version of CUDA. Like I said, every default build of torch installed by Poetry supports a particular version of CUDA, and the "2.1.1" version happens to support CUDA 12. Did you try upgrading to CUDA 12/your Nvidia drivers?
By contrast, builds installed from https://download.pytorch.org will specifically designate the CUDA version in a suffix (e.g. you might get "2.1.1+cu118"), but the default builds do not indicate the CUDA version they are compatible with in the version string.

@coolermzb3
Copy link
Contributor Author

No, the result on your Windows machine does not show this. The version that is designated as purely "2.1.1" does support CUDA, just not necessarily your version of CUDA. Like I said, every default build of torch installed by Poetry supports a particular version of CUDA, and the "2.1.1" version happens to support CUDA 12. Did you try upgrading to CUDA 12/your Nvidia drivers? By contrast, builds installed from https://download.pytorch.org will specifically designate the CUDA version in a suffix (e.g. you might get "2.1.1+cu118"), but the default builds do not indicate the CUDA version they are compatible with in the version string.

I didn't install CUDA through the official NVIDIA website, but CUDA 12 is indeed present on my computer. I suspect it might have been installed via PyTorch in some other conda environment.

However, in the Poetry test environment, where

> nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 546.33                 Driver Version: 546.33       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+

> pip list
torch   2.1.1   // default version from poetry, just as we have been discussing

and in python,

>>> import torch
>>> torch.cuda.is_available()
False

Furthermore,

>>> torch.tensor([0]).to("cuda")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\cmzb\miniconda3\envs\temp\Lib\site-packages\torch\cuda\__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

So, I'm still not quite sure whether the default build of torch installed by poetry supports CUDA.

@phoinix-chen
Copy link

@coolermzb3 Maybe you can add the following text to the corresponding block in pyproject.toml to FORCE Poetry to use the pytorch GPU source.

[tool.poetry.dependencies]
torch = {version = "^2.3.1", source = "pytorch"}

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
priority = "explicit"

[[tool.poetry.source]] contains three fields: name, url, priority.

The custom name is used in [tool.poetry.dependencies] to enable Poetry to identify which custom source should be used currently.

url is used to specify the source location.

And the priority source is configured as explicit, meaning that the source will only be searched if the package configuration explicitly indicates that it should be found on this package source.

@coolermzb3
Copy link
Contributor Author

@phoinix-chen Thank you! It works!

@opcode81
Copy link
Collaborator

opcode81 commented Jun 22, 2024

@coolermzb3 I had literally suggested the same solution in this link:

Regarding CUDA support in the "default" torch installation: The policy has changed.

By default, installing torch via Poetry will use a torch build that was built against a default version of CUDA (it is not a CPU-only version); and the version of CUDA it uses depends on the torch version. For example, torch 2.0 might have used CUDA 11 and later versions might now use CUDA 12 by default.

This statement is no longer true for Windows, but is still true for Linux. The version "2.1.0", for example, supports CUDA 12 on Linux but is CPU-only on Windows. It's an unfortunate and random decision, because Windows does, of course, still support CUDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants