Refactor: Modularize Inpaint Pipeline #6120
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Keeping this as Draft until tested with more models and settings.
This is a rewrite of the inpaint mask code in the diffusers pipeline. It centralizes all the mask logic to a single object that gets called from multiple points. The calls are based on location in the process as a proof of concept for a more elaborate modular extension system of the denoise process. Baby steps first.
Our previous process had mask-based decisions and code in five locations, across multiple object levels, and the mask was inverted partway through. Dragging it all together was necessary for debugging.
Related Issues / Discussions
This (mostly) fixes the issues with inpaint models that were introduced with gradient mask changes. If an Inpaint model is used and a denoise mask is provided that does not include a masked_latents field (generated by providing the original image and VAE to the Create Denoise Mask node), then it will synthetically create one by filling the mask area with predetermined values based on the model architecture. Doing so avoids a second VAE-encode on all inpainting/canvas processes.
This is not one-to-one with the standard setup. The new method has been tuned to reduce artifacts as much as possible, but there can still be some unusual highlighting and occasionally edges of inpainted objects get chopped off.
The dog on the left was inpainted with a VAE-encoded masked_latents, and the dog on the right was from a synthetic fill. The mask edge of the synthetic dog has lost part of the hindquarters and added an errant sparkle between the bench slats.
Oddly, using pure synthetic fill made a significantly higher success rate for inpainting objects from the prompt, nearing 100% before adjusting for artifacts. With the fixes in place to avoid edge artifacts, the success rate for synthetic is much closer to VAE, but still slightly higher. VAE will ignore the prompt and just fill in the bench a bit more often than synthetic does at the current settings.
Code is included for synthetic fill of SD-2 models, but it does not work at this time. Something is different about the way it expects masked_latents to look.
QA Instructions
Since this affects all of the inpainting pipelines, we need to test a variety of settings on all model architectures to make sure there is nothing strange or unexpected happening.
Checklist