[🌟 New Model] ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation #8414

Bai-YT · 2024-06-05T21:55:23Z

Model/Pipeline/Scheduler description

ConsistencyTTA, introduced in the paper Accelerating Diffusion-Based Text-to-Audio Generation
with Consistency Distillation, is an efficient text-to-audio generation model. Compared to a comparable diffusion-based TTA model, ConsistencyTTA achieves a 400x generation speed-up, while retaining the generation quality and diversity.

Due to its high generation quality and fast inference, we believe integrating this model into diffusers will make diffusers more appealing to text-to-audio generation researchers and users! Thank you very much.

Open source status

The model implementation is available.
The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

The open-source code implementation can be found at https://github.com/Bai-YT/ConsistencyTTA.

There is also a simplified implementation for inference only: https://github.com/Bai-YT/ConsistencyTTA/tree/main/easy_inference.

The model checkpoints can be found at https://huggingface.co/Bai-YT/ConsistencyTTA.

I am the main author of the code, and am more than happy to assist the integration.

The text was updated successfully, but these errors were encountered:

sayakpaul · 2024-06-06T06:44:55Z

@sanchit-gandhi @Vaibhavs10 FYI.

a-r-r-o-w · 2024-06-27T04:46:46Z

@Bai-YT Thank you for your awesome work! I just finished understanding the paper and think that I have a good grasp of the modeling and inference code to convert to diffusers.

@sayakpaul Could I pick this up if no one's working on it?

sayakpaul · 2024-06-27T04:48:44Z

Yeah for sure.

yiyixuxu · 2024-06-27T06:11:46Z

@a-r-r-o-w cool! but let's put it in community folder to start with

a-r-r-o-w · 2024-06-27T06:28:40Z

Sure, sounds good.

Bai-YT · 2024-06-27T07:10:58Z

@Bai-YT Thank you for your awesome work! I just finished understanding the paper and think that I have a good grasp of the modeling and inference code to convert to diffusers.

@sayakpaul Could I pick this up if no one's working on it?

Appreciate everyone's time for helping!!! Massive thanks.

Bai-YT changed the title ~~ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation~~ [🌟 New Model] ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation Jun 5, 2024

sayakpaul added the contributions-welcome label Jun 6, 2024

a-r-r-o-w linked a pull request Jun 29, 2024 that will close this issue

[community] ConsistencyTTA #8739

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[🌟 New Model] ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation #8414

[🌟 New Model] ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation #8414

Bai-YT commented Jun 5, 2024 •

edited

Loading

sayakpaul commented Jun 6, 2024

a-r-r-o-w commented Jun 27, 2024

sayakpaul commented Jun 27, 2024

yiyixuxu commented Jun 27, 2024

a-r-r-o-w commented Jun 27, 2024

Bai-YT commented Jun 27, 2024

[🌟 New Model] ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation #8414

[🌟 New Model] ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation #8414

Comments

Bai-YT commented Jun 5, 2024 • edited Loading

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

sayakpaul commented Jun 6, 2024

a-r-r-o-w commented Jun 27, 2024

sayakpaul commented Jun 27, 2024

yiyixuxu commented Jun 27, 2024

a-r-r-o-w commented Jun 27, 2024

Bai-YT commented Jun 27, 2024

Bai-YT commented Jun 5, 2024 •

edited

Loading