Shark V1 Nov 2024 Release Testing Bash #512

pdhirajkumarprasad · 2024-11-14T16:39:55Z

Please login into a MI300X machine. For AMD Shark Team, see internal slack channel on available machines.

Create a python virtual environment of your liking and activate it. If you are new to python, the simplest way is to do following the first time:

python3.12 -m venv .venv
source .venv/bin/activate

and following subsequently:

source .venv/bin/activate

Use the instruction on this Shark V1 Release Testing Bash to install and test the shark V1 release.

Feel free to test it however you like but here are some guidelines you could follow.

Testing guidelines

Run multiple servers on the same machine with different port numbers
Logout and login and try multiple times
Try using different options (flags) for servers and clients (see tables below)

Multiple people may try the same feature so whoever is trying a particular feature, please put your under "Testers" column in the tables below.

shortfin_apps.sd.server with different options:

Flags	options	Issues
--host HOST
--port PORT
--root-path ROOT_PATH
--timeout-keep-alive
--device	local-task,hip,amdgpu
--target	gfx942,gfx1100	#515
--device_ids
--tokenizers
--model_config
--workers_per_device
--fibers_per_device
--isolation	per_fiber, per_call, none
--show_progress
--trace_execution
--amdgpu_async_allocations
--splat
--build_preference	compile,precompiled
--compile_flags
--flagfile FLAGFILE		#515
--artifacts_dir ARTIFACTS_DIR

shortfin_apps.sd.simple_client with different options:

Flags	Testers	Issues
--file
--reps
--save
--outputdir
--steps
--interactive

other issues

Issue description	issue no

The text was updated successfully, but these errors were encountered:

dan-garvey · 2024-11-14T21:24:31Z

not a critique, just something I noticed, takes about 12 min for server startup on a cirrascale 8x mi300 machine

IanNod · 2024-11-14T21:35:03Z

I had same server startup time. I attributed it to downloading models/weights on setup.

Minor critique it does not look like we are changing the random latents generated. Not sure where that is controlled but was seeing the same image generated given the same prompt

dan-garvey · 2024-11-14T21:50:56Z

yeah as Ian said the seed appears fixed, I think when reps>1 it should be changed.

maybe this works?

                async for i in async_range(args.reps):
                data["seed"] = [i]
                pending.append(
                    asyncio.create_task(send_request(session, i, args, data))
                )
                await asyncio.sleep(
                    1
                )  # Wait for 1 second before sending the next request```

dan-garvey · 2024-11-14T21:53:17Z

well at least in the args.reps>1 case

archana-ramalingam · 2024-11-15T06:13:55Z

At cold start, incomplete model download causes the following issue. Deleting cached models and re-downloading them fixed it.

INFO:root:Loading parameter fiber 'model' from: /home/aramalin/.cache/shark/genfiles/sdxl/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/server.py", line 388, in
main(
File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/server.py", line 376, in main
sysman = configure(args)
^^^^^^^^^^^^^^^
File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/server.py", line 115, in configure
sm.load_inference_parameters(*datasets, parameter_scope="model", component=key)
File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/components/service.py", line 116, in load_inference_parameters
p.load(path, format=format)
ValueError: shortfin_iree-src/runtime/src/iree/io/formats/irpa/irpa_parser.c:16: OUT_OF_RANGE; file segment out of range (1766080 to 2614665369 for 2612899290, file_size=726679552); verifying storage segment

pdhirajkumarprasad · 2024-11-15T10:54:24Z

I have tried almost all flags and different stuff for client/server and added my observation here https://github.com/pdhirajkumarprasad/for_sharing_logs/blob/main/Shark-V1(Nov,%202024)-Bash.md

kumardeepakamd changed the title ~~Shark release V1-Nov, 2024, Testing Bash~~ Shark V1 Nov 2024 Release Testing Bash Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shark V1 Nov 2024 Release Testing Bash #512

Shark V1 Nov 2024 Release Testing Bash #512

pdhirajkumarprasad commented Nov 14, 2024 •

edited by vivekkhandelwal1

Loading

dan-garvey commented Nov 14, 2024 •

edited

Loading

IanNod commented Nov 14, 2024

dan-garvey commented Nov 14, 2024 •

edited

Loading

dan-garvey commented Nov 14, 2024

archana-ramalingam commented Nov 15, 2024

pdhirajkumarprasad commented Nov 15, 2024

Shark V1 Nov 2024 Release Testing Bash #512

Shark V1 Nov 2024 Release Testing Bash #512

Comments

pdhirajkumarprasad commented Nov 14, 2024 • edited by vivekkhandelwal1 Loading

Testing guidelines

shortfin_apps.sd.server with different options:

shortfin_apps.sd.simple_client with different options:

other issues

dan-garvey commented Nov 14, 2024 • edited Loading

IanNod commented Nov 14, 2024

dan-garvey commented Nov 14, 2024 • edited Loading

dan-garvey commented Nov 14, 2024

archana-ramalingam commented Nov 15, 2024

pdhirajkumarprasad commented Nov 15, 2024

pdhirajkumarprasad commented Nov 14, 2024 •

edited by vivekkhandelwal1

Loading

dan-garvey commented Nov 14, 2024 •

edited

Loading

dan-garvey commented Nov 14, 2024 •

edited

Loading