Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify apply_overlay for inpainting with padding_mask_crop (Inpainting area: "Only Masked") #8793

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

clarkkent0618
Copy link

@clarkkent0618 clarkkent0618 commented Jul 5, 2024

What does this PR do?

First of all, thanks for your great work. Here is my personal understanding. If there are any mistakes, feel free to correct me!

Regardless of whether it is the official documentation's description of the usage of the padding_mask_crop parameter or the actual effect when the inpainting area is set to "Only masked" when using the AUTOMATIC1111 WebUI, the original input image size should be maintained, thereby eliminating the need for additional super-resolution operations.

The description of the docs about padding_mask_crop :

Both the image and mask are upscaled to a higher resolution for inpainting, and then overlaid on the original image. This is a quick and easy way to improve image quality without using a separate pipeline like StableDiffusionUpscalePipeline.

However, in practice, when this feature is enabled in diffusers, the method apply_overlay will resize the init_image to the size of actually inpainted part(512 * 512 in default) at first. And if the input image to the pipeline is not resized before generation, the overlaid result will be incorrect. On the other hand, resizing the original image at the input stage fails to preserve the original image size. It will significantly degrades the image quality and necessitating super-resolution to restore it.

I don't think this logic aligns with the original intent of this feature and differs from the implementation in automatic1111. Therefore, I have modified the apply_overlay function accordingly to ensure that the output image retains the same size as the original image.

Here is the comparison.

  1. Original Image and mask
    dog_cat
    dog_cat_mask

  2. If I do not resize the original image size before pipeline(the existing version code): the overlay result is incorrect and the image is also resized at the same time.
    old_version

  3. If I resize the original image at first: the overlay result is correct. But it degrades the image quality since it has resized the init image size to 512.
    resized_old_version

  4. Modified Version: The output image is of the same size as the original input and the overlaid result is correct.
    new_version

  5. Using AUTOMATIC111 WebUI: Just select the checkbox shown below. The output image is definitely the same size as the original image without resizing.
    image
    image

Test Code

pipeline = StableDiffusionInpaintPipeline.from_pretrained(
    base_model_path, torch_dtype=torch.float16, variant="fp16",
    low_cpu_mem_usage=False, 
    safety_checker=None, 
    requires_safety_checker = False
)
pipeline.enable_model_cpu_offload()

# load base and mask image
image_path="dog_cat.jpg"
mask_path="dog_cat_mask.png"

init_image = cv2.imread(image_path)[:,:,::-1]
init_image = Image.fromarray(init_image.astype(np.uint8)).convert("RGB").resize((512,512))
mask_image = 1.*(cv2.imread(mask_path).sum(-1)>255)[:,:,np.newaxis]
mask_image = Image.fromarray(mask_image.astype(np.uint8).repeat(3,-1)*255).convert("RGB").resize((512,512))
mask_image = pipeline.mask_processor.blur(mask_image, blur_factor=4)

generator = torch.Generator("cuda").manual_seed(6188)
caption = "black cat"
image = pipeline(prompt=caption, 
                 image=init_image, 
                 mask_image=mask_image, 
                 generator=generator, 
                 num_inference_steps=25, 
                 strength=1,
                 padding_mask_crop=40,
                 ).images[0]
image.save(f"output_inpainting.png")

Modified Code: I modify the code as below.

  def apply_overlay(
      self,
      mask: PIL.Image.Image,
      init_image: PIL.Image.Image,
      image: PIL.Image.Image,
      crop_coords: Optional[Tuple[int, int, int, int]] = None,
  ) -> PIL.Image.Image:
      """
      overlay the inpaint output to the original image
      """

      width, height = init_image.width, init_image.height

      init_image_masked = PIL.Image.new("RGBa", (width, height))
      init_image_masked.paste(init_image.convert("RGBA").convert("RGBa"), mask=ImageOps.invert(mask.convert("L")))
      
      init_image_masked = init_image_masked.convert("RGBA")

      if crop_coords is not None:
          x, y, x2, y2 = crop_coords
          w = x2 - x
          h = y2 - y
          base_image = PIL.Image.new("RGBA", (width, height))
          image = self.resize(image, height=h, width=w, resize_mode="crop")
          base_image.paste(image, (x, y))
          image = base_image.convert("RGB")
          
      image = image.convert("RGBA") 
      image.alpha_composite(init_image_masked)
      image = image.convert("RGB")

      return image

original code: apply_overlay in src/diffusers/image_processor Line 651
https://github.com/huggingface/diffusers/blob/main/src/diffusers/image_processor.py#L651

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@clarkkent0618 clarkkent0618 changed the title Modify apply_overlay for inpainting to align with the intended purpose and logic of padding_mask_crop (Inpainting area: "Only Masked") Modify apply_overlay for inpainting with padding_mask_crop (Inpainting area: "Only Masked") Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant