Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add int4/int8 vicuna #1598

Merged
merged 1 commit into from
Jul 5, 2023
Merged

Add int4/int8 vicuna #1598

merged 1 commit into from
Jul 5, 2023

Conversation

jinchen62
Copy link
Contributor

@jinchen62 jinchen62 commented Jun 26, 2023

  • move brevitas_matmul_rhs_group_quant_library and vicuna pipeline to main file to avoid a decompose error

error: "<eval_with_key>.4":78:23: found an op that was marked as backend illegal
note: "<eval_with_key>.4":78:23: see current operation: %128 = "torch.aten._softmax"(%127, %40, %42) : (!torch.vtensor<[1,32,137,137],unk>, !torch.int, !torch.bool) -> !torch.vtensor<[1,32,137,137],unk>
note: "<eval_with_key>.4":78:23: this is likely due to DecomposeComplexOps being unable to decompose this op

  • keep the format of brevitas〇matmul_rhs_group_quant to avoid a signature mismatch error

raise ValueError(f"Signature mismatch for {f.name!r}: expected {expected_signature!r}, got {signature!r}")
ValueError: Signature mismatch for 'brevitas〇matmul_rhs_group_quant〡shape': expected 'def brevitas〇matmul_rhs_group_quant〡shape(lhs: List[int], rhs: List[int], rhs_scale: List[int], rhs_zero_point: List[int], rhs_bit_width: int, rhs_group_size: int) -> List[int]:', got 'def brevitas〇matmul_rhs_group_quant〡shape('

TODO:

  • check the canonicalization pass to make sure the tensors are processed properly
  • apply canonicalization for int4 to unpack the tensors

@jinchen62 jinchen62 force-pushed the qt_vicuna branch 3 times, most recently from 3fadf17 to 01195a8 Compare June 30, 2023 21:26
@jinchen62 jinchen62 changed the title [WIP] Add int4/int8 vicuna Add int4/int8 vicuna Jun 30, 2023
@jinchen62 jinchen62 force-pushed the qt_vicuna branch 2 times, most recently from 7fc74a3 to 66bbf06 Compare June 30, 2023 21:37
shark/shark_importer.py Outdated Show resolved Hide resolved
setup_venv.sh Outdated Show resolved Hide resolved
shark/shark_importer.py Show resolved Hide resolved
shark/shark_importer.py Outdated Show resolved Hide resolved
shark/shark_importer.py Outdated Show resolved Hide resolved
@jinchen62 jinchen62 force-pushed the qt_vicuna branch 5 times, most recently from 439d38e to 9e6cbeb Compare July 4, 2023 05:20
Copy link
Contributor

@Abhishek-Varma Abhishek-Varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update your patch as per the comment - precision in StableDiffusion's API won't be needed because it's already there.

apps/stable_diffusion/src/utils/utils.py Outdated Show resolved Hide resolved
@Abhishek-Varma
Copy link
Contributor

ModuleNotFoundError: No module named 'brevitas_examples' - @jinchen62

Can you please resolve such failures we see in CI?

@powderluv
Copy link
Contributor

We can add Brevitas to the requirements.txt with a git install

Copy link
Contributor

@Abhishek-Varma Abhishek-Varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address one comment - rest looks good.

Comment on lines +193 to +194
# brevitas custom op lib
apps/language_models/scripts/vicuna.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem correct. Can you confirm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does ignore this file from the format check. How does it look not corret for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you add it here it wont be detected by git. You want to add it to the black format command line with a --exclude or something

Copy link
Contributor

@Abhishek-Varma Abhishek-Varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the following sub-comments and also make changes to the following file in order to integrate with WebUI as well :-

Replace the set of lines at : https://github.com/nod-ai/SHARK/blob/91ab594744ffbe982ababe340fcd208923ecae48/apps/stable_diffusion/web/ui/stablelm_ui.py#L45-L47

With :

from apps.language_models.scripts.vicuna import (
            UnshardedVicuna,
)            

Replace the following : https://github.com/nod-ai/SHARK/blob/91ab594744ffbe982ababe340fcd208923ecae48/apps/stable_diffusion/web/ui/stablelm_ui.py#L61

With :

vicuna_model = UnshardedVicuna(

self.shark_model = self.compile()

def get_model_path(self, model_number="first", suffix="mlir"):
safe_device = "_".join(self.device.split("-"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this with safe_device = self.device.split("-")[0]

Comment on lines +1273 to +1323
def compile(self):
# Cannot load both the models in the memory at once
# due to memory constraints, hence on demand compilation
# is being used until the space is enough for both models

# Testing : DO NOT Download Vmfbs if not found. Modify later
# download vmfbs for A100
if (
not self.first_vicuna_vmfb_path.exists()
and self.device in ["cuda", "cpu"]
and self.precision in ["fp32", "fp16"]
):
# combinations that are still in the works
if not (self.device == "cuda" and self.precision == "fp16"):
# Will generate vmfb on device
pass
else:
download_public_file(
f"gs://shark_tank/vicuna/unsharded/vmfb/{self.first_vicuna_vmfb_path.name}",
self.first_vicuna_vmfb_path.absolute(),
single_file=True,
)
else:
# get first vic
# TODO: Remove after testing to avoid memory overload
# fvic_shark_model = self.compile_first_vicuna()
pass
if (
not self.second_vicuna_vmfb_path.exists()
and self.device in ["cuda", "cpu"]
and self.precision in ["fp32", "fp16"]
):
# combinations that are still in the works
if not (self.device == "cuda" and self.precision == "fp16"):
# Will generate vmfb on device
pass
else:
download_public_file(
f"gs://shark_tank/vicuna/unsharded/vmfb/{self.second_vicuna_vmfb_path.name}",
self.second_vicuna_vmfb_path.absolute(),
single_file=True,
)
else:
# get second vic
# TODO: Remove after testing to avoid memory overload
# svic_shark_model = self.compile_second_vicuna()
pass

return None
# return tuple of shark_modules once mem is supported
# return fvic_shark_model, svic_shark_model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this with :-

def compile(self):
        # Cannot load both the models in the memory at once
        # due to memory constraints, hence on demand compilation
        # is being used until the space is enough for both models

        # Testing : DO NOT Download Vmfbs if not found. Modify later
        # download vmfbs for A100
        supported_devices = ["cuda", "cpu-sync", "cpu-task", "cpu"]
        if (
            not self.first_vicuna_vmfb_path.exists()
            and self.device in supported_devices
            and self.precision in ["fp32", "fp16", "int8"]
        ):
            if (self.device == "cuda" and self.precision == "fp16") or (
                self.device in ["cpu-sync", "cpu-task"]
                and self.precision == "int8"
            ):
                download_public_file(
                    f"gs://shark_tank/vicuna/unsharded/vmfb/{self.first_vicuna_vmfb_path.name}",
                    self.first_vicuna_vmfb_path.absolute(),
                    single_file=True,
                )
            else:
                pass

        else:
            # get first vic
            # TODO: Remove after testing to avoid memory overload
            # fvic_shark_model = self.compile_first_vicuna()
            pass
        if (
            not self.second_vicuna_vmfb_path.exists()
            and self.device in supported_devices
            and self.precision in ["fp32", "fp16", "int8"]
        ):
            if (self.device == "cuda" and self.precision == "fp16") or (
                self.device in ["cpu-sync", "cpu-task"]
                and self.precision == "int8"
            ):
                download_public_file(
                    f"gs://shark_tank/vicuna/unsharded/vmfb/{self.second_vicuna_vmfb_path.name}",
                    self.second_vicuna_vmfb_path.absolute(),
                    single_file=True,
                )
            else:
                pass
        else:
            # get second vic
            # TODO: Remove after testing to avoid memory overload
            # svic_shark_model = self.compile_second_vicuna()
            pass

        return None
        # return tuple of shark_modules once mem is supported
        # return fvic_shark_model, svic_shark_model

@powderluv powderluv merged commit bc6fee1 into nod-ai:main Jul 5, 2023
@jinchen62 jinchen62 deleted the qt_vicuna branch July 5, 2023 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants