-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TPU v4 install guide #108
base: main
Are you sure you want to change the base?
TPU v4 install guide #108
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contributions! It's true that optimum-tpu
is focused mostly on v5 and future platforms, but if this helps you I would be happy to have that.
I just moved most of the install script to the cli.py
script, do you think you can do the same?
@@ -0,0 +1,24 @@ | |||
sudo apt remove unattended-upgrades |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you remove unattended-upgrades?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They kicked off twice, each after a sudo apt update and kept the TPU VM stuck for more than 90 minutes before I decided to just kill them. I consider the lifetime of a TPU VM to be short and the VM not to be exposed to the outside world. Hence, I think getting a stuck (costly) VM due to some potentially non-critical updates seems worse than not having this service and instead doing updates as per your own schedule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the issue, but I think that depends on the distribution you are using (I haven't experienced it so far), not necessarily related to optimum-tpu
, that should provide tools for machine learning on TPUs. Please remove this command, consider doing the command when you are setting up your machine, before using optimum-tpu
.
python -m venv optimum_tpu_env | ||
source optimum_tpu_env/bin/activate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need a virtual environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regular install of optimum-tpu always tried to go for a system wide installation which would then fail. I had to choose between --install-option="--prefix=/SOME/DIR/" and a venv and considered the venv my prefered way of handling this (and future) conflicts.
I wanted a pip -e install as I was actively developing against some of the files. YMMV for a package install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but this is a user choice too. Some people might prefer venv, others virtualenv, conda or even a docker image. I think it would be better to take it out from the script, leaving other users the freedom to choose their environment.
pip install -e . | ||
|
||
huggingface-cli login | ||
gsutil cp -r gs://entropix/huggingface_hub ~/.cache/huggingface/hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be rejected. Local install for custom changes and experiments. The bucket is one of our project buckets anyway.
@@ -61,10 +61,11 @@ tests = ["pytest", "safetensors"] | |||
quality = ["black", "ruff", "isort"] | |||
# Jetstream/Pytorch support is experimental for now, it needs to be installed manually. | |||
# Pallas is pulled because it will install a compatible version of jax[tpu]. | |||
jetstream-pt = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do not need to comment this: you will only install it if you do pip install optimum-tpu[pallas]
, otherwise it should not pull the dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
My bad, should have been generalized from the beginning. Co-authored-by: Alvaro Moran <[email protected]>
Hola, pip install --upgrade setuptools
error: subprocess-exited-with-error × python setup.py develop did not run successfully.
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error × python setup.py develop did not run successfully.
note: This error originates from a subprocess, and is likely not a problem with pip. This was on a newly created TPU v4. Please advise on how I should best test the cli.py installation feature because in general I am sure this should work as you indicated. |
I think creating a virtual environment (venv, virtualenv, docker or conda) is a good way to to prepare your VM for optimum-tpu installation. Once you activate/enter that, it should install setuptools in the environment and stop complaining trying to install that system-wide and you should be able to avoid other issues. I will make a release soon (probably by the end of the week), so hopefully some things will be clearer by then. |
This PR validates that it is possible to use optimum-tpu on older GCP TPUs, especially on the TPU v4 generation.
Originally, this repository is targeting TPU v5e. However, there is still both an existing large installed based for older TPU generations and, importantly, the TPU Research Council grants only encompass the older generations v2, v3, and v4 but no v5 and newer. Therefore, I wanted to ascertain that also on TPU v4 optimum-tpu can be used to accelerate research via huggingface.
Essentially, I have provided a validated and tested install plus the deactivation of pallas and Jetstream, which only target TPU v5.
Note that the install script is still intended to be run by hand although full automation should be straightforward.
Before submitting