Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm support #252

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions docker-compose.rocm.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
version: "3.9"
services:
refact_self_hosted:
# TODO: figureout how to pass gpu to docker builds, so there is no need to install deepspeed at runtime
command: >
/bin/bash -c 'pip install deepspeed --no-cache-dir
&& python -m self_hosting_machinery.watchdog.docker_watchdog'
image: refact_self_hosting_rocm
build:
dockerfile: rocm.Dockerfile
Copy link

@takov751 takov751 Jan 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    build:
+      context: .
      dockerfile: rocm.Dockerfile

This was the only issue i found with this build so far :D I am testing it right now, just waiting for the models to download

After some building and testing i have ecountered a big issue

refact_self_hosted_1  | -- 11 -- 20240102 00:08:39 MODEL STATUS loading model
refact_self_hosted_1  | -- 11 -- 20240102 00:08:39 MODEL loading model local_files_only=1
refact_self_hosted_1  | -- 11 -- 20240102 00:08:40 MODEL Exllama kernel is not installed, reset disable_exllama to True. This may because you installed auto_gptq using a pre-build wheel on Windows, in which exllama_kernels are not compiled. To use exllama_kernels to further speedup inference, you can re-install auto_gptq from source.
refact_self_hosted_1  | -- 11 -- 20240102 00:08:40 MODEL CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because:
refact_self_hosted_1  | -- 11 -- 1. You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source.
refact_self_hosted_1  | -- 11 -- 2. You are using pytorch without CUDA support.
refact_self_hosted_1  | -- 11 -- 3. CUDA and nvcc are not installed in your device.
refact_self_hosted_1  | -- 11 -- 20240102 00:08:40 MODEL lm_head not been quantized, will be ignored when make_quant.
refact_self_hosted_1  | -- 11 -- 20240102 00:08:40 MODEL CUDA extension not installed.

After some testing today i can say that sadly we need to wait more to make this happen . For example flash_attention probably going to work from rocm5.7 when it gets stable release.I saw that you have tried some workarounds, but i believe it did not worked due to rocm library differences

So far even when it builded and started most of the time i just got timeout error , and model was not loaded properly.

shm_size: "32gb"
devices:
- "/dev/kfd"
- "/dev/dri"
group_add:
- "video"
security_opt:
- seccomp:unconfined
volumes:
- perm_storage:/perm_storage
ports:
- 8008:8008
nginx:
image: nginx
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/conf.d/default.conf:ro

volumes:
perm_storage:
68 changes: 68 additions & 0 deletions rocm.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
FROM ocelot88/rocm-pytorch-slim:rocm-5.7.1-dev-torch-2.3
RUN apt-get update
RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y \
curl \
git \
htop \
tmux \
file \
vim \
expect \
mpich \
libmpich-dev \
python3 python3-pip \
&& rm -rf /var/lib/{apt,dpkg,cache,log}


RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1

# linguist requisites
RUN apt-get update
RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y \
expect \
ruby-full \
ruby-bundler \
build-essential \
cmake \
pkg-config \
libicu-dev \
zlib1g-dev \
libcurl4-openssl-dev \
libssl-dev
RUN git clone https://github.com/smallcloudai/linguist.git /tmp/linguist \
&& cd /tmp/linguist \
&& bundle install \
&& rake build_gem

ENV PATH="${PATH}:/tmp/linguist/bin"

RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y python3-packaging

ENV INSTALL_OPTIONAL=TRUE
ENV BUILD_CUDA_EXT=1
ENV USE_ROCM=1
ENV GITHUB_ACTIONS=true
ENV AMDGPU_TARGETS="gfx1030"
ENV FLASH_ATTENTION_FORCE_BUILD=TRUE
ENV MAX_JOBS=8
COPY . /tmp/app
RUN pip install --upgrade pip ninja packaging
RUN DEBIAN_FRONTEND=noninteractive apt-get install python3-mpi4py -y
ENV PYTORCH_ROCM_ARCH="gfx1030"
ENV ROCM_TARGET="gfx1030"
ENV ROCM_HOME=/opt/rocm-5.7.1
# TODO: https://github.com/TimDettmers/bitsandbytes/pull/756 remove this layer, when this pr merged
RUN git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6 && \
cd bitsandbytes-rocm-5.6 && \
make hip && pip install . && \
cd .. && rm -rf bitsandbytes-rocm-5.6
RUN pip install /tmp/app -v --no-build-isolation && rm -rf /tmp/app
RUN ln -s ${ROCM_HOME} /opt/rocm
ENV REFACT_PERM_DIR "/perm_storage"
ENV REFACT_TMP_DIR "/tmp"
ENV RDMAV_FORK_SAFE 0
ENV RDMAV_HUGEPAGES_SAFE 0

EXPOSE 8008

CMD ["python", "-m", "self_hosting_machinery.watchdog.docker_watchdog"]
37 changes: 36 additions & 1 deletion self_hosting_machinery/scripts/enum_gpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,38 @@

from self_hosting_machinery import env

def query_rocm_smi():
rocm_smi_output = "- no output -"
descriptions = []
try:
rocm_smi_output = subprocess.check_output([
"/opt/rocm/bin/rocm-smi",
"--showbus",
"--showproductname",
"--showtemp",
"--showmeminfo", "vram",
"--json"])
logging.info(rocm_smi_output)
smi_output_dict = json.loads(rocm_smi_output)
for gpu_id, props in smi_output_dict.items():
descriptions.append({
"id": props.get("PCI Bus"),
"name": props.get("Card model", "AMD GPU"),
"mem_used_mb": bytes_to_mb(int(props.get("VRAM Total Used Memory (B)", 0))),
"mem_total_mb": bytes_to_mb(int(props.get("VRAM Total Memory (B)", 0 ))),
"temp_celsius": props.get("Temperature (Sensor junction) (C)", -1),
})
except Exception:
logging.warning("rocm-smi does not work, that's especially bad for initial setup.")
logging.warning(traceback.format_exc())
logging.warning(f"output was:\n{smi_output_dict}")

return {"gpus": descriptions}

def bytes_to_mb(bytes_size):
mb_size = bytes_size / (1024 ** 2)
return mb_size


def query_nvidia_smi():
nvidia_smi_output = "- no output -"
Expand Down Expand Up @@ -42,7 +74,10 @@ def query_nvidia_smi():


def enum_gpus():
result = query_nvidia_smi()
if os.environ.get('USE_ROCM'):
result = query_rocm_smi()
else:
result = query_nvidia_smi()
with open(env.CONFIG_ENUM_GPUS + ".tmp", 'w') as f:
json.dump(result, f, indent=4)
os.rename(env.CONFIG_ENUM_GPUS + ".tmp", env.CONFIG_ENUM_GPUS)
Expand Down
36 changes: 31 additions & 5 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

setup_package = os.environ.get("SETUP_PACKAGE", None)
install_optional = os.environ.get("INSTALL_OPTIONAL", "FALSE")
use_rocm = os.environ.get("USE_ROCM", "FALSE")


@dataclass
Expand Down Expand Up @@ -44,12 +45,24 @@ class PyPackage:
"self_hosting_machinery": PyPackage(
requires=["aiohttp", "aiofiles", "cryptography", "fastapi==0.100.0", "giturlparse", "pydantic==1.10.13",
"starlette==0.27.0", "uvicorn", "uvloop", "python-multipart", "auto-gptq==0.4.2", "accelerate",
"termcolor", "torch", "transformers==4.34.0", "bitsandbytes", "safetensors", "peft", "triton",
"torchinfo", "mpi4py", "deepspeed==0.11.1"],
optional=["ninja", "flash_attn @ git+https://github.com/smallcloudai/flash-attention@feat/alibi"],
"termcolor", "torch", "transformers==4.34.0", "bitsandbytes", "safetensors", "peft",
"torchinfo"],
optional=["ninja"],
requires_packages=["refact_scratchpads", "refact_scratchpads_no_gpu",
"known_models_db", "refact_data_pipeline"],
data=["webgui/static/*", "webgui/static/js/*", "webgui/static/components/modals/*", "watchdog/watchdog.d/*"]),
"rocm": PyPackage(
requires=[
# "bitsandbytes", # TODO: bitsandbytes still dont have support for the ROCm, so we build it from sources, see: https://github.com/TimDettmers/bitsandbytes/pull/756
# "deepspeed", # TODO: figure out how to install deepspeed at build time, see: docker-compose.rocm.yaml
# "flash_attn", # TODO: flash_attn has support limited support for GPUs, see: https://github.com/ROCmSoftwarePlatform/flash-attention/tree/flash_attention_for_rocm2
"pytorch-triton-rocm",
]
),
"cuda": PyPackage(
requires=["mpi4py", "deepspeed==0.11.1", "triton"],
optional=["flash_attn @ git+https://github.com/smallcloudai/flash-attention@feat/alibi"],
),
}


Expand All @@ -66,17 +79,30 @@ def find_required_packages(packages: Set[str]) -> Set[str]:
def get_install_requires(packages):
install_requires = list({
required_package
for py_package in packages.values()
for key, py_package in packages.items()
for required_package in py_package.requires
if key not in ("rocm", "cuda")
})
if install_optional.upper() == "TRUE":
install_requires.extend(list({
required_package
for py_package in packages.values()
for key, py_package in packages.items()
for required_package in py_package.optional
if key not in ("rocm", "cuda")
}))
install_requires.extend(get_runtime_dependent_dependencies(packages))
return install_requires

def get_runtime_dependent_dependencies(packages):
required = []
runtime_key = "rocm" if use_rocm else "cuda"
if use_rocm:
required.extend(package for package in packages.get(runtime_key).requires)
if install_optional.upper() == "TRUE":
required.extend(package for package in packages.get(runtime_key).optional)
return required



if setup_package is not None:
if setup_package not in all_refact_packages:
Expand Down