Modify apply_overlay for inpainting with padding_mask_crop (Inpainting area: "Only Masked") #8793
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
First of all, thanks for your great work. Here is my personal understanding. If there are any mistakes, feel free to correct me!
Regardless of whether it is the official documentation's description of the usage of the
padding_mask_crop
parameter or the actual effect when the inpainting area is set to "Only masked" when using the AUTOMATIC1111 WebUI, the original input image size should be maintained, thereby eliminating the need for additional super-resolution operations.The description of the docs about padding_mask_crop :
However, in practice, when this feature is enabled in diffusers, the method
apply_overlay
will resize the init_image to the size of actually inpainted part(512 * 512 in default) at first. And if the input image to the pipeline is not resized before generation, the overlaid result will be incorrect. On the other hand, resizing the original image at the input stage fails to preserve the original image size. It will significantly degrades the image quality and necessitating super-resolution to restore it.I don't think this logic aligns with the original intent of this feature and differs from the implementation in automatic1111. Therefore, I have modified the apply_overlay function accordingly to ensure that the output image retains the same size as the original image.
Here is the comparison.
Original Image and mask
![dog_cat](https://private-user-images.githubusercontent.com/33905626/346020643-ebc37a14-7ac2-48c0-9c25-06c3f2eef439.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA1ODAxMjIsIm5iZiI6MTcyMDU3OTgyMiwicGF0aCI6Ii8zMzkwNTYyNi8zNDYwMjA2NDMtZWJjMzdhMTQtN2FjMi00OGMwLTljMjUtMDZjM2YyZWVmNDM5LmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEwVDAyNTAyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTIyNmY1ZDYzMzlhYzRmMGJkZjdiYjM5NDhmNTJkOTAxZGRkNTg0NWU0MDMyMTczOWIwNTg3OTVhNzk4YWRiMDQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Rz5s87lBeXeyOohep87gtn73PGmu2v9MvQ0k50hGD0c)
![dog_cat_mask](https://private-user-images.githubusercontent.com/33905626/346028051-e90a1cc8-9fbe-47bf-a1b3-315f257eda66.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA1ODAxMjIsIm5iZiI6MTcyMDU3OTgyMiwicGF0aCI6Ii8zMzkwNTYyNi8zNDYwMjgwNTEtZTkwYTFjYzgtOWZiZS00N2JmLWExYjMtMzE1ZjI1N2VkYTY2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEwVDAyNTAyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTY3NjhhOWE4MjQ2MDg4NDVkM2Y1N2UxOTNhNzg5MDEwMWFiNjhlYzExNWZkMWVhMDE4OGZiOTdiODc3NjhmMzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.iSvCPLoCrYyPH-oB2X7j_iHPr-kqVe4ASjCFw5Zg42I)
If I do not resize the original image size before pipeline(the existing version code): the overlay result is incorrect and the image is also resized at the same time.
![old_version](https://private-user-images.githubusercontent.com/33905626/345997285-7207f7e4-056b-4d90-a99f-d1297972e8b5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA1ODAxMjIsIm5iZiI6MTcyMDU3OTgyMiwicGF0aCI6Ii8zMzkwNTYyNi8zNDU5OTcyODUtNzIwN2Y3ZTQtMDU2Yi00ZDkwLWE5OWYtZDEyOTc5NzJlOGI1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEwVDAyNTAyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM3ZDViNGI1Mzk1ZDM5YjFkNTM4MDQxMmFmY2EzZjEzMTQxOWNkOTdjYmMxMTAwMjQwNzgzMDI4NDJkYzYwMGYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.iv285jwj8pBb_fWk7SOLgEuQAq3AENebECmA4wK-pEI)
If I resize the original image at first: the overlay result is correct. But it degrades the image quality since it has resized the init image size to 512.
![resized_old_version](https://private-user-images.githubusercontent.com/33905626/345998079-2a283717-d291-4dfa-a4b7-ee84b2f8c23e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA1ODAxMjIsIm5iZiI6MTcyMDU3OTgyMiwicGF0aCI6Ii8zMzkwNTYyNi8zNDU5OTgwNzktMmEyODM3MTctZDI5MS00ZGZhLWE0YjctZWU4NGIyZjhjMjNlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEwVDAyNTAyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWEzYTczZDc3NGQ3NjdkMDczOTZhZWM0YTMxMzRhOWE5Y2JmZDU1MmVkM2JkYjRkNGRhNDUyNWEzYmRlMjU0OGUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.4GUF8WCGNufejcNlkdBy-NeTiz-aPWBDGpVbe_Nb75I)
Modified Version: The output image is of the same size as the original input and the overlaid result is correct.
![new_version](https://private-user-images.githubusercontent.com/33905626/345997393-c77b5c1e-1c6b-4a45-b2e7-b110c3dce677.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA1ODAxMjIsIm5iZiI6MTcyMDU3OTgyMiwicGF0aCI6Ii8zMzkwNTYyNi8zNDU5OTczOTMtYzc3YjVjMWUtMWM2Yi00YTQ1LWIyZTctYjExMGMzZGNlNjc3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEwVDAyNTAyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU4ZmYwNWU5NTZjNDk0NjhhY2I4YjM2MjVkMTU1ZmJlNzdiNzRkYTYyMmI2NjNjMjg1ZDg5NjhlOTEyY2VhNzEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.vOMW86iiV5B2EgNalEhur_81yLO-UWklpmXLZr0B0So)
Using AUTOMATIC111 WebUI: Just select the checkbox shown below. The output image is definitely the same size as the original image without resizing.
![image](https://private-user-images.githubusercontent.com/33905626/346016404-100d6840-6057-4a37-b809-d6966a53870f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA1ODAxMjIsIm5iZiI6MTcyMDU3OTgyMiwicGF0aCI6Ii8zMzkwNTYyNi8zNDYwMTY0MDQtMTAwZDY4NDAtNjA1Ny00YTM3LWI4MDktZDY5NjZhNTM4NzBmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEwVDAyNTAyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ3ZDljZWM1OTBjNGIzNzE3OGNmOTFlNzk3MmQxOGM5ODE4MGFhYjAyZjJlOTU5YzhjOWQ5YzkyNzNkZDc3MGYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Vea1IOr_jcQhLFXcG7FdTmt3_m1NVYSVTnMKHfNcQco)
![image](https://private-user-images.githubusercontent.com/33905626/346017170-f7bd842e-d6a3-4a8f-a149-ef5f04ce4118.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA1ODAxMjIsIm5iZiI6MTcyMDU3OTgyMiwicGF0aCI6Ii8zMzkwNTYyNi8zNDYwMTcxNzAtZjdiZDg0MmUtZDZhMy00YThmLWExNDktZWY1ZjA0Y2U0MTE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEwVDAyNTAyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThhYmNmY2EzYjgwYjY5Zjk5MDBkNzY5NzIzYTQ2MWJiZGViMTliNWNkZjY3NWZlZGU0ZjYyMDFmYWQ1YjhlNjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.rSP1IVa-5XcEoHa-_coN6GG3D5ajQAQi0j-iQnkWo-A)
Test Code
Modified Code: I modify the code as below.
original code:
apply_overlay
in src/diffusers/image_processor Line 651https://github.com/huggingface/diffusers/blob/main/src/diffusers/image_processor.py#L651
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.