Add int4/int8 vicuna #1598

jinchen62 · 2023-06-26T23:35:31Z

move brevitas_matmul_rhs_group_quant_library and vicuna pipeline to main file to avoid a decompose error

error: "<eval_with_key>.4":78:23: found an op that was marked as backend illegal
note: "<eval_with_key>.4":78:23: see current operation: %128 = "torch.aten._softmax"(%127, %40, %42) : (!torch.vtensor<[1,32,137,137],unk>, !torch.int, !torch.bool) -> !torch.vtensor<[1,32,137,137],unk>
note: "<eval_with_key>.4":78:23: this is likely due to DecomposeComplexOps being unable to decompose this op

keep the format of brevitas〇matmul_rhs_group_quant to avoid a signature mismatch error

raise ValueError(f"Signature mismatch for {f.name!r}: expected {expected_signature!r}, got {signature!r}")
ValueError: Signature mismatch for 'brevitas〇matmul_rhs_group_quant〡shape': expected 'def brevitas〇matmul_rhs_group_quant〡shape(lhs: List[int], rhs: List[int], rhs_scale: List[int], rhs_zero_point: List[int], rhs_bit_width: int, rhs_group_size: int) -> List[int]:', got 'def brevitas〇matmul_rhs_group_quant〡shape('

build torch-mlir from source with https://github.com/jinchen62/torch-mlir/tree/brevitas

TODO:

check the canonicalization pass to make sure the tensors are processed properly
apply canonicalization for int4 to unpack the tensors

shark/shark_importer.py

setup_venv.sh

shark/shark_importer.py

Abhishek-Varma

Please update your patch as per the comment - precision in StableDiffusion's API won't be needed because it's already there.

apps/stable_diffusion/src/utils/utils.py

apps/language_models/src/pipelines/falcon_pipeline.py

Abhishek-Varma · 2023-07-04T06:50:54Z

ModuleNotFoundError: No module named 'brevitas_examples' - @jinchen62

Can you please resolve such failures we see in CI?

powderluv · 2023-07-04T06:54:38Z

We can add Brevitas to the requirements.txt with a git install

Abhishek-Varma

Please address one comment - rest looks good.

Abhishek-Varma · 2023-07-04T07:23:56Z

.gitignore

+# brevitas custom op lib
+apps/language_models/scripts/vicuna.py


This doesn't seem correct. Can you confirm?

It does ignore this file from the format check. How does it look not corret for you?

if you add it here it wont be detected by git. You want to add it to the black format command line with a --exclude or something

Abhishek-Varma

Please address the following sub-comments and also make changes to the following file in order to integrate with WebUI as well :-

Replace the set of lines at : https://github.com/nod-ai/SHARK/blob/91ab594744ffbe982ababe340fcd208923ecae48/apps/stable_diffusion/web/ui/stablelm_ui.py#L45-L47

With :

from apps.language_models.scripts.vicuna import (
            UnshardedVicuna,
)

Replace the following : https://github.com/nod-ai/SHARK/blob/91ab594744ffbe982ababe340fcd208923ecae48/apps/stable_diffusion/web/ui/stablelm_ui.py#L61

With :

vicuna_model = UnshardedVicuna(

Abhishek-Varma · 2023-07-05T13:07:43Z

apps/language_models/scripts/vicuna.py

+        self.shark_model = self.compile()
+
+    def get_model_path(self, model_number="first", suffix="mlir"):
+        safe_device = "_".join(self.device.split("-"))


Replace this with safe_device = self.device.split("-")[0]

Abhishek-Varma · 2023-07-05T13:09:13Z

apps/language_models/scripts/vicuna.py

+    def compile(self):
+        # Cannot load both the models in the memory at once
+        # due to memory constraints, hence on demand compilation
+        # is being used until the space is enough for both models
+
+        # Testing : DO NOT Download Vmfbs if not found. Modify later
+        # download vmfbs for A100
+        if (
+            not self.first_vicuna_vmfb_path.exists()
+            and self.device in ["cuda", "cpu"]
+            and self.precision in ["fp32", "fp16"]
+        ):
+            # combinations that are still in the works
+            if not (self.device == "cuda" and self.precision == "fp16"):
+                # Will generate vmfb on device
+                pass
+            else:
+                download_public_file(
+                    f"gs://shark_tank/vicuna/unsharded/vmfb/{self.first_vicuna_vmfb_path.name}",
+                    self.first_vicuna_vmfb_path.absolute(),
+                    single_file=True,
+                )
+        else:
+            # get first vic
+            # TODO: Remove after testing to avoid memory overload
+            # fvic_shark_model = self.compile_first_vicuna()
+            pass
+        if (
+            not self.second_vicuna_vmfb_path.exists()
+            and self.device in ["cuda", "cpu"]
+            and self.precision in ["fp32", "fp16"]
+        ):
+            # combinations that are still in the works
+            if not (self.device == "cuda" and self.precision == "fp16"):
+                # Will generate vmfb on device
+                pass
+            else:
+                download_public_file(
+                    f"gs://shark_tank/vicuna/unsharded/vmfb/{self.second_vicuna_vmfb_path.name}",
+                    self.second_vicuna_vmfb_path.absolute(),
+                    single_file=True,
+                )
+        else:
+            # get second vic
+            # TODO: Remove after testing to avoid memory overload
+            # svic_shark_model = self.compile_second_vicuna()
+            pass
+
+        return None
+        # return tuple of shark_modules once mem is supported
+        # return fvic_shark_model, svic_shark_model


Replace this with :-

def compile(self): # Cannot load both the models in the memory at once # due to memory constraints, hence on demand compilation # is being used until the space is enough for both models # Testing : DO NOT Download Vmfbs if not found. Modify later # download vmfbs for A100 supported_devices = ["cuda", "cpu-sync", "cpu-task", "cpu"] if ( not self.first_vicuna_vmfb_path.exists() and self.device in supported_devices and self.precision in ["fp32", "fp16", "int8"] ): if (self.device == "cuda" and self.precision == "fp16") or ( self.device in ["cpu-sync", "cpu-task"] and self.precision == "int8" ): download_public_file( f"gs://shark_tank/vicuna/unsharded/vmfb/{self.first_vicuna_vmfb_path.name}", self.first_vicuna_vmfb_path.absolute(), single_file=True, ) else: pass else: # get first vic # TODO: Remove after testing to avoid memory overload # fvic_shark_model = self.compile_first_vicuna() pass if ( not self.second_vicuna_vmfb_path.exists() and self.device in supported_devices and self.precision in ["fp32", "fp16", "int8"] ): if (self.device == "cuda" and self.precision == "fp16") or ( self.device in ["cpu-sync", "cpu-task"] and self.precision == "int8" ): download_public_file( f"gs://shark_tank/vicuna/unsharded/vmfb/{self.second_vicuna_vmfb_path.name}", self.second_vicuna_vmfb_path.absolute(), single_file=True, ) else: pass else: # get second vic # TODO: Remove after testing to avoid memory overload # svic_shark_model = self.compile_second_vicuna() pass return None # return tuple of shark_modules once mem is supported # return fvic_shark_model, svic_shark_model

jinchen62 force-pushed the qt_vicuna branch 3 times, most recently from 3fadf17 to 01195a8 Compare June 30, 2023 21:26

jinchen62 changed the title ~~[WIP] Add int4/int8 vicuna~~ Add int4/int8 vicuna Jun 30, 2023

jinchen62 force-pushed the qt_vicuna branch 2 times, most recently from 7fc74a3 to 66bbf06 Compare June 30, 2023 21:37

dan-garvey requested changes Jun 30, 2023

View reviewed changes

shark/shark_importer.py Outdated Show resolved Hide resolved

setup_venv.sh Outdated Show resolved Hide resolved

shark/shark_importer.py Show resolved Hide resolved

shark/shark_importer.py Outdated Show resolved Hide resolved

shark/shark_importer.py Outdated Show resolved Hide resolved

jinchen62 force-pushed the qt_vicuna branch 5 times, most recently from 439d38e to 9e6cbeb Compare July 4, 2023 05:20

Abhishek-Varma reviewed Jul 4, 2023

View reviewed changes

apps/stable_diffusion/src/utils/utils.py Outdated Show resolved Hide resolved

jinchen62 force-pushed the qt_vicuna branch from 9e6cbeb to acef3a6 Compare July 4, 2023 05:54

vivekkhandelwal1 requested changes Jul 4, 2023

View reviewed changes

apps/language_models/src/pipelines/falcon_pipeline.py Outdated Show resolved Hide resolved

Add int4/int8 vicuna

29f5ca1

jinchen62 force-pushed the qt_vicuna branch from acef3a6 to 29f5ca1 Compare July 4, 2023 07:02

Abhishek-Varma reviewed Jul 4, 2023

View reviewed changes

Abhishek-Varma reviewed Jul 5, 2023

View reviewed changes

powderluv merged commit bc6fee1 into nod-ai:main Jul 5, 2023

jinchen62 deleted the qt_vicuna branch July 5, 2023 23:40

Abhishek-Varma mentioned this pull request Jul 6, 2023

[Vicuna] Revert the formatting for Brevitas op #1626

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add int4/int8 vicuna #1598

Add int4/int8 vicuna #1598

jinchen62 commented Jun 26, 2023 •

edited

Loading

Abhishek-Varma left a comment

Abhishek-Varma commented Jul 4, 2023

powderluv commented Jul 4, 2023

Abhishek-Varma left a comment

Abhishek-Varma Jul 4, 2023

jinchen62 Jul 4, 2023

powderluv Jul 4, 2023

Abhishek-Varma left a comment

Abhishek-Varma Jul 5, 2023

Abhishek-Varma Jul 5, 2023

		# brevitas custom op lib
		apps/language_models/scripts/vicuna.py

Add int4/int8 vicuna #1598

Add int4/int8 vicuna #1598

Conversation

jinchen62 commented Jun 26, 2023 • edited Loading

Abhishek-Varma left a comment

Choose a reason for hiding this comment

Abhishek-Varma commented Jul 4, 2023

powderluv commented Jul 4, 2023

Abhishek-Varma left a comment

Choose a reason for hiding this comment

Abhishek-Varma Jul 4, 2023

Choose a reason for hiding this comment

jinchen62 Jul 4, 2023

Choose a reason for hiding this comment

powderluv Jul 4, 2023

Choose a reason for hiding this comment

Abhishek-Varma left a comment

Choose a reason for hiding this comment

Abhishek-Varma Jul 5, 2023

Choose a reason for hiding this comment

Abhishek-Varma Jul 5, 2023

Choose a reason for hiding this comment

jinchen62 commented Jun 26, 2023 •

edited

Loading