Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shark V1 Nov 2024 Release Testing Bash #512

Open
pdhirajkumarprasad opened this issue Nov 14, 2024 · 6 comments
Open

Shark V1 Nov 2024 Release Testing Bash #512

pdhirajkumarprasad opened this issue Nov 14, 2024 · 6 comments

Comments

@pdhirajkumarprasad
Copy link

pdhirajkumarprasad commented Nov 14, 2024

Please login into a MI300X machine. For AMD Shark Team, see internal slack channel on available machines.

  • Create a python virtual environment of your liking and activate it. If you are new to python, the simplest way is to do following the first time:
python3.12 -m venv .venv
source .venv/bin/activate

and following subsequently:

source .venv/bin/activate

Feel free to test it however you like but here are some guidelines you could follow.

Testing guidelines

  • Run multiple servers on the same machine with different port numbers
  • Logout and login and try multiple times
  • Try using different options (flags) for servers and clients (see tables below)

Multiple people may try the same feature so whoever is trying a particular feature, please put your under "Testers" column in the tables below.

shortfin_apps.sd.server with different options:

Flags options Testers Issues
--host HOST
--port PORT
--root-path ROOT_PATH
--timeout-keep-alive
--device local-task,hip,amdgpu
--target gfx942,gfx1100 #515
--device_ids
--tokenizers
--model_config
--workers_per_device
--fibers_per_device
--isolation per_fiber, per_call, none
--show_progress
--trace_execution
--amdgpu_async_allocations
--splat
--build_preference compile,precompiled
--compile_flags
--flagfile FLAGFILE #515
--artifacts_dir ARTIFACTS_DIR

shortfin_apps.sd.simple_client with different options:

Flags Testers Issues
--file
--reps
--save
--outputdir
--steps
--interactive

other issues

Issue description issue no
@kumardeepakamd kumardeepakamd changed the title Shark release V1-Nov, 2024, Testing Bash Shark V1 Nov 2024 Release Testing Bash Nov 14, 2024
@dan-garvey
Copy link
Member

dan-garvey commented Nov 14, 2024

not a critique, just something I noticed, takes about 12 min for server startup on a cirrascale 8x mi300 machine

@IanNod
Copy link
Contributor

IanNod commented Nov 14, 2024

I had same server startup time. I attributed it to downloading models/weights on setup.

Minor critique it does not look like we are changing the random latents generated. Not sure where that is controlled but was seeing the same image generated given the same prompt

@dan-garvey
Copy link
Member

dan-garvey commented Nov 14, 2024

yeah as Ian said the seed appears fixed, I think when reps>1 it should be changed.

maybe this works?

                async for i in async_range(args.reps):
                data["seed"] = [i]
                pending.append(
                    asyncio.create_task(send_request(session, i, args, data))
                )
                await asyncio.sleep(
                    1
                )  # Wait for 1 second before sending the next request```

@dan-garvey
Copy link
Member

well at least in the args.reps>1 case

@archana-ramalingam
Copy link
Collaborator

At cold start, incomplete model download causes the following issue. Deleting cached models and re-downloading them fixed it.

INFO:root:Loading parameter fiber 'model' from: /home/aramalin/.cache/shark/genfiles/sdxl/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/server.py", line 388, in
main(
File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/server.py", line 376, in main
sysman = configure(args)
^^^^^^^^^^^^^^^
File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/server.py", line 115, in configure
sm.load_inference_parameters(*datasets, parameter_scope="model", component=key)
File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/components/service.py", line 116, in load_inference_parameters
p.load(path, format=format)
ValueError: shortfin_iree-src/runtime/src/iree/io/formats/irpa/irpa_parser.c:16: OUT_OF_RANGE; file segment out of range (1766080 to 2614665369 for 2612899290, file_size=726679552); verifying storage segment

@pdhirajkumarprasad
Copy link
Author

I have tried almost all flags and different stuff for client/server and added my observation here https://github.com/pdhirajkumarprasad/for_sharing_logs/blob/main/Shark-V1(Nov,%202024)-Bash.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants