inpainting added, and works via api #13

zsxkib · 2024-08-23T22:37:02Z

🚀	This description was created by Ellipsis for commit `ad819de`

Summary:

Added inpainting functionality to predict.py using FluxInpaintPipeline, with new inputs for image and mask, and updated cog.yaml for dependencies.

Key points:

Added inpainting functionality using FluxInpaintPipeline in predict.py.
Introduced image and mask inputs for inpainting mode in Predictor.predict.
Implemented setup_inpaint_pipeline to initialize inpainting pipelines.
Updated cog.yaml to install diffusers with inpainting support.
Disabled safety checker temporarily, marked for future implementation.
Added resize_image_dimensions to adjust image size for inpainting.

Generated with ❤️ by ellipsis.dev

ellipsis-dev

👍 Looks good to me! Reviewed everything up to a890dd8 in 11 seconds

More details

Looked at 240 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. predict.py:230

Draft comment:
Consider using resize_image_dimensions instead of make_multiple_of_16 for consistency in resizing images to dimensions that are multiples of 32.
Reason this comment was not posted:
Confidence changes required: 50%
The resize_image_dimensions function in predict.py is not part of the diff, but it is relevant to the changes made in the PR. The function ensures that the new dimensions are multiples of 32, which is a good practice for image processing in neural networks. However, the function is not used in the predict method when aspect_ratio is 'custom'. Instead, make_multiple_of_16 is used, which might lead to inconsistencies. The resize_image_dimensions function should be used consistently for resizing to ensure dimensions are multiples of 32.

2. predict.py:326

Draft comment:
Consider using resize_image_dimensions instead of make_multiple_of_16 for consistency in resizing images to dimensions that are multiples of 32.
Reason this comment was not posted:
Confidence changes required: 50%
The load_image function is a utility function that loads an image and converts it to RGB. This is a good practice to ensure consistency in image processing. However, it is not part of the diff, so I will not comment on it.

Workflow ID: wflow_IUd4UaalHPSVsn1G

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

⌛ 6 days left in your free trial, upgrade for $20/seat/month or contact us.

zsxkib · 2024-08-23T22:48:45Z

Could someone please have a look? (I'm also very happy to pair and iron out the lazy loading edge cases.)

I had a lot of trouble getting this to work—especially because we don't have much lazy loading implemented within the model, whereas in the cog-flux repo (https://github.com/replicate/cog-flux/tree/dev-img2img) there's some amount of lazy loading to allow for inference on A40s. So I ended up chronically pushing changes to Replicate so I could use the A100s (without setting up another dev machine on CoreWeave, lol).

I've added lazy loading as a point on the Notion doc (https://www.notion.so/replicate/Flux-fine-tuning-ee5a253453b94549bb39cf3701917b25#0006ea53192d4afd869b7e8ff923e238)—it really depends if we want to exclusively use H100s, or if we want to use A100s too.

A couple of issues though:

I had to install an experimental version of diffusers: git+https://github.com/Gothos/diffusers.git@flux-inpaint.
- I've purposefully put it in the run section of the cog.yaml instead of python_packages since it caused issues with diffusers==0.30.0's dependencies. Essentially, Gothos' fork requires slightly different pinned dependencies, and I didn't want to break anything—because it did break when I tried naively replacing it.
I have all of the safety checker code commented out so that I could run it on an A100—we can't merge in the current state since we wouldn't have a safety checker anymore.
For some reason, I just couldn't have the inpainting pipeline online at the same time as the other models. Makes sense, we have FluxPipe, InpaintingPipe, and safety checkers.

So, I took a bunch of time to add lazy loading for the inpainting pipeline:

If we're in inpainting mode (image and mask), then we create the pipe if not already done, load it on the GPU if not already done, then inpaint. Consecutive runs will be fast because we won't reinitialize the pipe. But if we switch modes (not inpaint), we unload it (this fixed the out-of-memory errors on the A100s).

I've tested it a bunch with the API under zsxkib/prototype-model:2a65f914, but I'd really appreciate it if someone could play around with it too.

I think it'd be good to wait until I'm back on Tuesday. I feel like the lazy loading code makes the predict method a bit overly complex (maybe that's just me though)—any feedback is appreciated, I really want to improve my coding skills.

Inpainting example
txt2img example—still works fine!

predict.py

ellipsis-dev

👍 Looks good to me! Incremental review on fbccf68 in 25 seconds

More details

Looked at 128 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. predict.py:208

Draft comment:
Consider adding a check to ensure the prompt is not empty to avoid unexpected behavior.
Reason this comment was not posted:
Confidence changes required: 50%
The prompt parameter is not being validated for emptiness, which could lead to unexpected behavior if an empty prompt is passed.

2. predict.py:237

Draft comment:
Ensure consistent use of prompt_strength instead of strength throughout the code to avoid confusion or errors.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is about ensuring consistency in the use of prompt_strength, which was a change made in the diff. However, the change seems to have been applied consistently throughout the modified code. The comment might not be necessary if the change is already consistent.
I might be overlooking other parts of the code where strength could still be used, but the diff only shows changes where prompt_strength is used consistently.
The diff indicates that the change was applied consistently, so the comment might not be necessary.
The comment is not necessary as the change to prompt_strength appears to be consistently applied in the modified code.

3. predict.py:312

Draft comment:
Ensure that the safety checker is properly implemented or document the risks of having it disabled.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is speculative and asks for verification, which violates the rules. The code already has a TODO comment indicating the safety checker should be re-enabled, making this comment redundant. The comment does not point out a definite issue that requires a code change.
I might be overlooking the importance of documenting risks when a safety feature is disabled, but the rules clearly state not to make speculative comments or ask for verification.
The rules are clear about not making speculative comments or asking for verification, so the comment should be removed.
Remove the comment as it is speculative and redundant, and does not point out a definite issue requiring a code change.

Workflow ID: wflow_R6zIGwSIXuvv5ucf

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

⌛ 1 day left in your free trial, upgrade for $20/seat/month or contact us.

ellipsis-dev

👍 Looks good to me! Incremental review on 43be9d7 in 21 seconds

More details

Looked at 256 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. predict.py:285

Draft comment:
Image.LANCZOS is deprecated. Use Image.Resampling.LANCZOS instead. This applies to line 287 as well.
Reason this comment was not posted:
Confidence changes required: 50%
The code uses Image.LANCZOS for resizing images, which is deprecated. The correct constant to use is Image.Resampling.LANCZOS. This should be updated to avoid future issues.

2. predict.py:118

Draft comment:
The check if pipe.device.type != "cuda": is redundant since the model is already moved to CUDA. Consider removing this check and the subsequent pipe.to("cuda") call.
Reason this comment was not posted:
Confidence changes required: 50%
The configure_active_model method has redundant calls to pipe.to("cuda"). The check for pipe.device.type is unnecessary since the model is already moved to CUDA in the previous steps. This can be optimized by removing the redundant check and call.

3. predict.py:430

Draft comment:
Consider using torch.bfloat16 for clip_input to maintain consistency with the rest of the code, which uses torch.bfloat16.
Reason this comment was not posted:
Confidence changes required: 50%
The run_safety_checker method uses torch.float16 for the clip_input. However, the rest of the code uses torch.bfloat16. For consistency and to avoid potential precision issues, it might be better to use torch.bfloat16 here as well.

Workflow ID: wflow_HewGSYOHCI2PxYc2

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

⌛ 1 day left in your free trial, upgrade for $20/seat/month or contact us.

ellipsis-dev

⌛ 20 hours left in your free trial, upgrade for $20/seat/month or contact us.

predict.py

andreasjansson · 2024-08-30T12:57:32Z

I worry that we're thrashing in and out of GPU memory. Could we not keep both schnell and dev in memory, and if inpainting is used, unload schnell and keep the inpainting pipeline in memory?

Also, could we add some print logging for how long it takes to swap weights in and out of CPU?

…nto lazy loading performance

(Keep dev, safety checker on GPU at all times) At start we have ['safety_checker', 'dev', 'schnell'] on GPU. The 'schnell' space is the one that gets unloaded for space, the others stay on. E.g., if we want to do inpainting with dev, we'll unload schnell and we'll have: ['safety_checker', 'dev', 'dev_inpaint']

zsxkib · 2024-08-30T15:55:24Z

I've updated the code to implement better lazy loading, ensuring that we keep the dev and safety_checker models on the GPU at all times (and schnell is always loaded at setup).

Initial GPU State

At the start, we have [safety_checker, dev, schnell] loaded on the GPU.
schnell is the only "seat" that is swapped-out

Model Swapping Logic

The schnell model is the one that gets unloaded to free up space when needed. The dev and safety_checker models remain loaded at all times.

Example Scenarios

Inpainting with dev:
- We unload schnell and load dev_inpaint, resulting in [safety_checker, dev, dev_inpaint] on the GPU.
Switching to schnell inpainting:
- We unload dev_inpaint and load schnell_inpaint, resulting in [safety_checker, dev, schnell_inpaint] on the GPU.
Returning to non-inpainting mode:
- We reload the original state by unloading schnell_inpaint and reloading schnell, resulting in [safety_checker, dev, schnell] on the GPU.

Lazy Loading Strategy Table

Step	Inputs	Action	GPU Models
Initial Setup	N/A	Setup models	`[safety_checker, dev, schnell]`
First Command	- `mask`: `...` - `image`: `...` - `model`: `"dev"`	Unload `schnell`, load `dev_inpaint`	`[safety_checker, dev, dev_inpaint]`
Second Command	- `mask`: `...` - `image`: `...` - `model`: `"schnell"`	Unload `dev_inpaint`, load `schnell_inpaint`	`[safety_checker, dev, schnell_inpaint]`
Third Command	- `mask`: `...` - `image`: `...` - `model`: `"dev"`	Unload `schnell_inpaint`, load `dev_inpaint`	`[safety_checker, dev, dev_inpaint]`
Fourth Command	- `prompt`: `...` - `model`: `"dev"`	Return to non-inpainting mode (reload `schnell`)	`[safety_checker, dev, schnell]`

Explanation of Updates

Initial Setup:
- Start with [safety_checker, dev, schnell] on the GPU.
First Command:
- Action: Inpainting with dev.
- Model Swap: Unload schnell to free space, load dev_inpaint.
- GPU Models: [safety_checker, dev, dev_inpaint]
Second Command:
- Action: Inpainting with schnell.
- Model Swap: Unload dev_inpaint, load schnell_inpaint.
- GPU Models: [safety_checker, dev, schnell_inpaint]
Third Command:
- Action: Inpainting with dev again.
- Model Swap: Unload schnell_inpaint, reload dev_inpaint.
- GPU Models: [safety_checker, dev, dev_inpaint]
Fourth Command:
- Action: Text-to-image with dev (no inpainting).
- Model Swap: Return to initial state by unloading dev_inpaint and reloading schnell.
- GPU Models: [safety_checker, dev, schnell]
I've tested it a lot with sudo cog run -p 5000 python -m cog.server.http on one terminal
and a bunch of API calls on another terminal
and we do not get OOMs (on an A100 80GB) 🎉

zsxkib · 2024-08-30T16:15:01Z

Logs to see timings⏳:

Loading safety checker...
Loading Flux dev pipeline
Loading Flux schnell pipeline
...
[!] Loaded models: ['safety_checker', 'dev', 'schnell']
setup took:  87.8359432220459
Using seed: 61448
Prompt: small cute cat sat on a park bench
[~] Configuring active model: dev, inpaint: True
[~] Moving schnell to CPU...
[!] Moved schnell to CPU in 16.86s
[~] Loading dev_inpaint pipeline...
[~] Creating new dev inpaint pipelin
...
[~] Created dev inpaint pipeline in 5.77s
[!] Moved model components to CUDA in 0.01s
[!] Loaded models: ['safety_checker', 'dev', 'dev_inpaint']
[!] Total time for configure_active_model: 22.64s
inpaint mode
Using dev model for inpainting
Using dev model
...
Using seed: 39676
Prompt: small cute cat sat on a park bench
[~] Configuring active model: schnell, inpaint: True
[~] Moving dev_inpaint to CPU...
[!] Moved dev_inpaint to CPU in 16.50s
[~] Loading schnell_inpaint pipeline...
[~] Creating new schnell inpaint pipelin
...
[~] Created schnell inpaint pipeline in 5.58s
[!] Moved model components to CUDA in 0.01s
[!] Loaded models: ['safety_checker', 'dev', 'schnell_inpaint']
[!] Total time for configure_active_model: 22.09s
inpaint mode
Using schnell model for inpainting
Using schnell model
...
Using seed: 29409
Prompt: small cute cat sat on a park bench
[~] Configuring active model: dev, inpaint: True
[~] Moving schnell_inpaint to CPU...
[!] Moved schnell_inpaint to CPU in 16.67s
[~] Loading dev_inpaint pipeline...
[!] Moved dev inpaint to CUDA in 4.21s
[!] Moved model components to CUDA in 0.01s
[!] Loaded models: ['safety_checker', 'dev', 'dev_inpaint']
[!] Total time for configure_active_model: 20.90s
inpaint mode
Using dev model for inpainting
Using dev model
...
Using seed: 61274
Prompt: small cute cat sat on a park bench
[~] Configuring active model: dev, inpaint: False
[~] Moving dev_inpaint to CPU...
[!] Moved dev_inpaint to CPU in 14.15s
[~] Moving schnell model to CUDA...
[!] Moved schnell to CUDA in 4.04s
[!] Moved model components to CUDA in 0.01s
[!] Loaded models: ['safety_checker', 'dev', 'schnell']
[!] Total time for configure_active_model: 18.20s
txt2img mode
Using dev model
...

….g. vae, etc)

ellipsis-dev bot reviewed Aug 23, 2024

View reviewed changes

andreasjansson reviewed Aug 25, 2024

View reviewed changes

predict.py Outdated Show resolved Hide resolved

predict.py Outdated Show resolved Hide resolved

predict.py Outdated Show resolved Hide resolved

predict.py Outdated Show resolved Hide resolved

predict.py Show resolved Hide resolved

predict.py Show resolved Hide resolved

ellipsis-dev bot reviewed Aug 28, 2024

View reviewed changes

inpainting added, and works via api

ad819de

zsxkib force-pushed the inpainting branch from 43be9d7 to ad819de Compare August 29, 2024 11:57

ellipsis-dev bot reviewed Aug 29, 2024

View reviewed changes

predict.py Show resolved Hide resolved

zsxkib added 2 commits August 30, 2024 14:37

Add timing logs for model swapping operations to improve visibility i…

3317073

…nto lazy loading performance

Merge main into inpainting and resolve conflicts

593f887

Lazy loading removed, everything can live on gpu (shared compenents e…

5574be5

….g. vae, etc)

zsxkib merged commit 290e81a into main Aug 30, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inpainting added, and works via api #13

inpainting added, and works via api #13

zsxkib commented Aug 23, 2024 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

zsxkib commented Aug 23, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

andreasjansson commented Aug 30, 2024

zsxkib commented Aug 30, 2024

zsxkib commented Aug 30, 2024

inpainting added, and works via api #13

inpainting added, and works via api #13

Conversation

zsxkib commented Aug 23, 2024 • edited by ellipsis-dev bot Loading

Summary:

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

zsxkib commented Aug 23, 2024

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

andreasjansson commented Aug 30, 2024

zsxkib commented Aug 30, 2024

Initial GPU State

Model Swapping Logic

Example Scenarios

Lazy Loading Strategy Table

Explanation of Updates

zsxkib commented Aug 30, 2024

zsxkib commented Aug 23, 2024 •

edited by ellipsis-dev bot

Loading