Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPU v4 install guide #108

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

artus-LYTiQ
Copy link

This PR validates that it is possible to use optimum-tpu on older GCP TPUs, especially on the TPU v4 generation.

Originally, this repository is targeting TPU v5e. However, there is still both an existing large installed based for older TPU generations and, importantly, the TPU Research Council grants only encompass the older generations v2, v3, and v4 but no v5 and newer. Therefore, I wanted to ascertain that also on TPU v4 optimum-tpu can be used to accelerate research via huggingface.

Essentially, I have provided a validated and tested install plus the deactivation of pallas and Jetstream, which only target TPU v5.

Note that the install script is still intended to be run by hand although full automation should be straightforward.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Copy link
Collaborator

@tengomucho tengomucho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contributions! It's true that optimum-tpu is focused mostly on v5 and future platforms, but if this helps you I would be happy to have that.
I just moved most of the install script to the cli.py script, do you think you can do the same?

install-on-TPU-v4.sh Outdated Show resolved Hide resolved
install-on-TPU-v4.sh Show resolved Hide resolved
@@ -0,0 +1,24 @@
sudo apt remove unattended-upgrades
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you remove unattended-upgrades?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They kicked off twice, each after a sudo apt update and kept the TPU VM stuck for more than 90 minutes before I decided to just kill them. I consider the lifetime of a TPU VM to be short and the VM not to be exposed to the outside world. Hence, I think getting a stuck (costly) VM due to some potentially non-critical updates seems worse than not having this service and instead doing updates as per your own schedule.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the issue, but I think that depends on the distribution you are using (I haven't experienced it so far), not necessarily related to optimum-tpu, that should provide tools for machine learning on TPUs. Please remove this command, consider doing the command when you are setting up your machine, before using optimum-tpu.

Comment on lines +16 to +17
python -m venv optimum_tpu_env
source optimum_tpu_env/bin/activate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need a virtual environment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regular install of optimum-tpu always tried to go for a system wide installation which would then fail. I had to choose between --install-option="--prefix=/SOME/DIR/" and a venv and considered the venv my prefered way of handling this (and future) conflicts.

I wanted a pip -e install as I was actively developing against some of the files. YMMV for a package install.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but this is a user choice too. Some people might prefer venv, others virtualenv, conda or even a docker image. I think it would be better to take it out from the script, leaving other users the freedom to choose their environment.

install-on-TPU-v4.sh Show resolved Hide resolved
pip install -e .

huggingface-cli login
gsutil cp -r gs://entropix/huggingface_hub ~/.cache/huggingface/hub
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Author

@artus-LYTiQ artus-LYTiQ Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be rejected. Local install for custom changes and experiments. The bucket is one of our project buckets anyway.

@@ -61,10 +61,11 @@ tests = ["pytest", "safetensors"]
quality = ["black", "ruff", "isort"]
# Jetstream/Pytorch support is experimental for now, it needs to be installed manually.
# Pallas is pulled because it will install a compatible version of jax[tpu].
jetstream-pt = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do not need to comment this: you will only install it if you do pip install optimum-tpu[pallas], otherwise it should not pull the dependency

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

My bad, should have been generalized from the beginning.

Co-authored-by: Alvaro Moran <[email protected]>
@artus-LYTiQ
Copy link
Author

Hola,
when using pip install optimum-tpu to continue to use it as a cli install tool, I have to realize that your cli changes are not yet implemented as of the latest pypi release. When I then go back to installing via pip -e and I am not allowed to venv, I still end up in this spot:

pip install --upgrade setuptools
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: setuptools in /home/artuskg/.local/lib/python3.10/site-packages (75.2.0)
artuskg@t1v-n-be3833de-w-0:~/optimum-tpu$ pip install -e .
Defaulting to user installation because normal site-packages is not writeable
Obtaining file:///home/artuskg/optimum-tpu
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Installing collected packages: UNKNOWN
Running setup.py develop for UNKNOWN
error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [44 lines of output]
    /tmp/pip-build-env-odocea89/overlay/local/lib/python3.10/dist-packages/setuptools_scm/_integration/setuptools.py:31: RuntimeWarning:
    ERROR: setuptools==59.6.0 is used in combination with setuptools_scm>=8.x
    
    Your build configuration is incomplete and previously worked by accident!
    setuptools_scm requires setuptools>=61
    
    Suggested workaround if applicable:
     - migrating from the deprecated setup_requires mechanism to pep517/518
       and using a pyproject.toml to declare build dependencies
       which are reliably pre-installed before running the build tools
    
      warnings.warn(
    running develop
    /usr/lib/python3/dist-packages/setuptools/command/easy_install.py:158: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    WARNING: The user site-packages directory is disabled.
    /usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    error: can't create or remove files in install directory
    
    The following error occurred while trying to add or remove files in the
    installation directory:
    
        [Errno 13] Permission denied: '/usr/local/lib/python3.10/dist-packages/test-easy-install-11654.write-test'
    
    The installation directory you specified (via --install-dir, --prefix, or
    the distutils default setting) was:
    
        /usr/local/lib/python3.10/dist-packages/
    
    Perhaps your account does not have write access to this directory?  If the
    installation directory is a system-owned directory, you may need to sign in
    as the administrator or "root" account.  If you do not have administrative
    access to this machine, you may wish to choose a different installation
    directory, preferably one that is listed in your PYTHONPATH environment
    variable.
    
    For information on other options, you may wish to consult the
    documentation at:
    
      https://setuptools.pypa.io/en/latest/deprecated/easy_install.html
    
    Please make the appropriate changes for your system and try again.
    
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [44 lines of output]
/tmp/pip-build-env-odocea89/overlay/local/lib/python3.10/dist-packages/setuptools_scm/_integration/setuptools.py:31: RuntimeWarning:
ERROR: setuptools==59.6.0 is used in combination with setuptools_scm>=8.x

Your build configuration is incomplete and previously worked by accident!
setuptools_scm requires setuptools>=61

Suggested workaround if applicable:
 - migrating from the deprecated setup_requires mechanism to pep517/518
   and using a pyproject.toml to declare build dependencies
   which are reliably pre-installed before running the build tools

  warnings.warn(
running develop
/usr/lib/python3/dist-packages/setuptools/command/easy_install.py:158: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
WARNING: The user site-packages directory is disabled.
/usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

    [Errno 13] Permission denied: '/usr/local/lib/python3.10/dist-packages/test-easy-install-11654.write-test'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

    /usr/local/lib/python3.10/dist-packages/

Perhaps your account does not have write access to this directory?  If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account.  If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.

For information on other options, you may wish to consult the
documentation at:

  https://setuptools.pypa.io/en/latest/deprecated/easy_install.html

Please make the appropriate changes for your system and try again.

[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
artuskg@t1v-n-be3833de-w-0:/optimum-tpu$ pip install --user --upgrade 'setuptools>=61'
Requirement already satisfied: setuptools>=61 in /home/artuskg/.local/lib/python3.10/site-packages (75.2.0)
artuskg@t1v-n-be3833de-w-0:
/optimum-tpu$ pip install --user -e .
Obtaining file:///home/artuskg/optimum-tpu
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Installing collected packages: UNKNOWN
Running setup.py develop for UNKNOWN
error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [44 lines of output]
    /tmp/pip-build-env-_j6dh9z3/overlay/local/lib/python3.10/dist-packages/setuptools_scm/_integration/setuptools.py:31: RuntimeWarning:
    ERROR: setuptools==59.6.0 is used in combination with setuptools_scm>=8.x
    
    Your build configuration is incomplete and previously worked by accident!
    setuptools_scm requires setuptools>=61
    
    Suggested workaround if applicable:
     - migrating from the deprecated setup_requires mechanism to pep517/518
       and using a pyproject.toml to declare build dependencies
       which are reliably pre-installed before running the build tools
    
      warnings.warn(
    running develop
    /usr/lib/python3/dist-packages/setuptools/command/easy_install.py:158: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    WARNING: The user site-packages directory is disabled.
    /usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    error: can't create or remove files in install directory
    
    The following error occurred while trying to add or remove files in the
    installation directory:
    
        [Errno 13] Permission denied: '/usr/local/lib/python3.10/dist-packages/test-easy-install-11798.write-test'
    
    The installation directory you specified (via --install-dir, --prefix, or
    the distutils default setting) was:
    
        /usr/local/lib/python3.10/dist-packages/
    
    Perhaps your account does not have write access to this directory?  If the
    installation directory is a system-owned directory, you may need to sign in
    as the administrator or "root" account.  If you do not have administrative
    access to this machine, you may wish to choose a different installation
    directory, preferably one that is listed in your PYTHONPATH environment
    variable.
    
    For information on other options, you may wish to consult the
    documentation at:
    
      https://setuptools.pypa.io/en/latest/deprecated/easy_install.html
    
    Please make the appropriate changes for your system and try again.
    
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [44 lines of output]
/tmp/pip-build-env-_j6dh9z3/overlay/local/lib/python3.10/dist-packages/setuptools_scm/_integration/setuptools.py:31: RuntimeWarning:
ERROR: setuptools==59.6.0 is used in combination with setuptools_scm>=8.x

Your build configuration is incomplete and previously worked by accident!
setuptools_scm requires setuptools>=61

Suggested workaround if applicable:
 - migrating from the deprecated setup_requires mechanism to pep517/518
   and using a pyproject.toml to declare build dependencies
   which are reliably pre-installed before running the build tools

  warnings.warn(
running develop
/usr/lib/python3/dist-packages/setuptools/command/easy_install.py:158: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
WARNING: The user site-packages directory is disabled.
/usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

    [Errno 13] Permission denied: '/usr/local/lib/python3.10/dist-packages/test-easy-install-11798.write-test'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

    /usr/local/lib/python3.10/dist-packages/

Perhaps your account does not have write access to this directory?  If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account.  If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.

For information on other options, you may wish to consult the
documentation at:

  https://setuptools.pypa.io/en/latest/deprecated/easy_install.html

Please make the appropriate changes for your system and try again.

[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

This was on a newly created TPU v4. Please advise on how I should best test the cli.py installation feature because in general I am sure this should work as you indicated.

@tengomucho
Copy link
Collaborator

I think creating a virtual environment (venv, virtualenv, docker or conda) is a good way to to prepare your VM for optimum-tpu installation. Once you activate/enter that, it should install setuptools in the environment and stop complaining trying to install that system-wide and you should be able to avoid other issues. I will make a release soon (probably by the end of the week), so hopefully some things will be clearer by then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants