Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inpainting added, and works via api #13

Merged
merged 5 commits into from
Aug 30, 2024
Merged

inpainting added, and works via api #13

merged 5 commits into from
Aug 30, 2024

Conversation

zsxkib
Copy link
Contributor

@zsxkib zsxkib commented Aug 23, 2024

🚀 This description was created by Ellipsis for commit ad819de

Summary:

Added inpainting functionality to predict.py using FluxInpaintPipeline, with new inputs for image and mask, and updated cog.yaml for dependencies.

Key points:

  • Added inpainting functionality using FluxInpaintPipeline in predict.py.
  • Introduced image and mask inputs for inpainting mode in Predictor.predict.
  • Implemented setup_inpaint_pipeline to initialize inpainting pipelines.
  • Updated cog.yaml to install diffusers with inpainting support.
  • Disabled safety checker temporarily, marked for future implementation.
  • Added resize_image_dimensions to adjust image size for inpainting.

Generated with ❤️ by ellipsis.dev

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to a890dd8 in 11 seconds

More details
  • Looked at 240 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. predict.py:230
  • Draft comment:
    Consider using resize_image_dimensions instead of make_multiple_of_16 for consistency in resizing images to dimensions that are multiples of 32.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The resize_image_dimensions function in predict.py is not part of the diff, but it is relevant to the changes made in the PR. The function ensures that the new dimensions are multiples of 32, which is a good practice for image processing in neural networks. However, the function is not used in the predict method when aspect_ratio is 'custom'. Instead, make_multiple_of_16 is used, which might lead to inconsistencies. The resize_image_dimensions function should be used consistently for resizing to ensure dimensions are multiples of 32.
2. predict.py:326
  • Draft comment:
    Consider using resize_image_dimensions instead of make_multiple_of_16 for consistency in resizing images to dimensions that are multiples of 32.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The load_image function is a utility function that loads an image and converts it to RGB. This is a good practice to ensure consistency in image processing. However, it is not part of the diff, so I will not comment on it.

Workflow ID: wflow_IUd4UaalHPSVsn1G


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

6 days left in your free trial, upgrade for $20/seat/month or contact us.

@zsxkib
Copy link
Contributor Author

zsxkib commented Aug 23, 2024

Could someone please have a look? (I'm also very happy to pair and iron out the lazy loading edge cases.)

I had a lot of trouble getting this to work—especially because we don't have much lazy loading implemented within the model, whereas in the cog-flux repo (https://github.com/replicate/cog-flux/tree/dev-img2img) there's some amount of lazy loading to allow for inference on A40s. So I ended up chronically pushing changes to Replicate so I could use the A100s (without setting up another dev machine on CoreWeave, lol).

I've added lazy loading as a point on the Notion doc (https://www.notion.so/replicate/Flux-fine-tuning-ee5a253453b94549bb39cf3701917b25#0006ea53192d4afd869b7e8ff923e238)—it really depends if we want to exclusively use H100s, or if we want to use A100s too.

A couple of issues though:

  1. I had to install an experimental version of diffusers: git+https://github.com/Gothos/diffusers.git@flux-inpaint.

    • I've purposefully put it in the run section of the cog.yaml instead of python_packages since it caused issues with diffusers==0.30.0's dependencies. Essentially, Gothos' fork requires slightly different pinned dependencies, and I didn't want to break anything—because it did break when I tried naively replacing it.
  2. I have all of the safety checker code commented out so that I could run it on an A100—we can't merge in the current state since we wouldn't have a safety checker anymore.

  3. For some reason, I just couldn't have the inpainting pipeline online at the same time as the other models. Makes sense, we have FluxPipe, InpaintingPipe, and safety checkers.

So, I took a bunch of time to add lazy loading for the inpainting pipeline:

  • If we're in inpainting mode (image and mask), then we create the pipe if not already done, load it on the GPU if not already done, then inpaint. Consecutive runs will be fast because we won't reinitialize the pipe. But if we switch modes (not inpaint), we unload it (this fixed the out-of-memory errors on the A100s).

I've tested it a bunch with the API under zsxkib/prototype-model:2a65f914, but I'd really appreciate it if someone could play around with it too.

I think it'd be good to wait until I'm back on Tuesday. I feel like the lazy loading code makes the predict method a bit overly complex (maybe that's just me though)—any feedback is appreciated, I really want to improve my coding skills.

predict.py Outdated Show resolved Hide resolved
predict.py Outdated Show resolved Hide resolved
predict.py Outdated Show resolved Hide resolved
predict.py Outdated Show resolved Hide resolved
predict.py Show resolved Hide resolved
predict.py Show resolved Hide resolved
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on fbccf68 in 25 seconds

More details
  • Looked at 128 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. predict.py:208
  • Draft comment:
    Consider adding a check to ensure the prompt is not empty to avoid unexpected behavior.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The prompt parameter is not being validated for emptiness, which could lead to unexpected behavior if an empty prompt is passed.
2. predict.py:237
  • Draft comment:
    Ensure consistent use of prompt_strength instead of strength throughout the code to avoid confusion or errors.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is about ensuring consistency in the use of prompt_strength, which was a change made in the diff. However, the change seems to have been applied consistently throughout the modified code. The comment might not be necessary if the change is already consistent.
    I might be overlooking other parts of the code where strength could still be used, but the diff only shows changes where prompt_strength is used consistently.
    The diff indicates that the change was applied consistently, so the comment might not be necessary.
    The comment is not necessary as the change to prompt_strength appears to be consistently applied in the modified code.
3. predict.py:312
  • Draft comment:
    Ensure that the safety checker is properly implemented or document the risks of having it disabled.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is speculative and asks for verification, which violates the rules. The code already has a TODO comment indicating the safety checker should be re-enabled, making this comment redundant. The comment does not point out a definite issue that requires a code change.
    I might be overlooking the importance of documenting risks when a safety feature is disabled, but the rules clearly state not to make speculative comments or ask for verification.
    The rules are clear about not making speculative comments or asking for verification, so the comment should be removed.
    Remove the comment as it is speculative and redundant, and does not point out a definite issue requiring a code change.

Workflow ID: wflow_R6zIGwSIXuvv5ucf


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

1 day left in your free trial, upgrade for $20/seat/month or contact us.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 43be9d7 in 21 seconds

More details
  • Looked at 256 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. predict.py:285
  • Draft comment:
    Image.LANCZOS is deprecated. Use Image.Resampling.LANCZOS instead. This applies to line 287 as well.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code uses Image.LANCZOS for resizing images, which is deprecated. The correct constant to use is Image.Resampling.LANCZOS. This should be updated to avoid future issues.
2. predict.py:118
  • Draft comment:
    The check if pipe.device.type != "cuda": is redundant since the model is already moved to CUDA. Consider removing this check and the subsequent pipe.to("cuda") call.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The configure_active_model method has redundant calls to pipe.to("cuda"). The check for pipe.device.type is unnecessary since the model is already moved to CUDA in the previous steps. This can be optimized by removing the redundant check and call.
3. predict.py:430
  • Draft comment:
    Consider using torch.bfloat16 for clip_input to maintain consistency with the rest of the code, which uses torch.bfloat16.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The run_safety_checker method uses torch.float16 for the clip_input. However, the rest of the code uses torch.bfloat16. For consistency and to avoid potential precision issues, it might be better to use torch.bfloat16 here as well.

Workflow ID: wflow_HewGSYOHCI2PxYc2


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

1 day left in your free trial, upgrade for $20/seat/month or contact us.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 hours left in your free trial, upgrade for $20/seat/month or contact us.

predict.py Show resolved Hide resolved
@andreasjansson
Copy link
Member

I worry that we're thrashing in and out of GPU memory. Could we not keep both schnell and dev in memory, and if inpainting is used, unload schnell and keep the inpainting pipeline in memory?

Also, could we add some print logging for how long it takes to swap weights in and out of CPU?

zsxkib added 2 commits August 30, 2024 14:37
(Keep dev, safety checker on GPU at all times)

At start we have ['safety_checker', 'dev', 'schnell'] on GPU.

The 'schnell' space is the one that gets unloaded for space, the others stay on.

E.g., if we want to do inpainting with dev, we'll unload schnell and we'll have:
['safety_checker', 'dev', 'dev_inpaint']
@zsxkib
Copy link
Contributor Author

zsxkib commented Aug 30, 2024

I've updated the code to implement better lazy loading, ensuring that we keep the dev and safety_checker models on the GPU at all times (and schnell is always loaded at setup).

Initial GPU State

  • At the start, we have [safety_checker, dev, schnell] loaded on the GPU.
  • schnell is the only "seat" that is swapped-out

Model Swapping Logic

  • The schnell model is the one that gets unloaded to free up space when needed. The dev and safety_checker models remain loaded at all times.

Example Scenarios

  • Inpainting with dev:

    • We unload schnell and load dev_inpaint, resulting in [safety_checker, dev, dev_inpaint] on the GPU.
  • Switching to schnell inpainting:

    • We unload dev_inpaint and load schnell_inpaint, resulting in [safety_checker, dev, schnell_inpaint] on the GPU.
  • Returning to non-inpainting mode:

    • We reload the original state by unloading schnell_inpaint and reloading schnell, resulting in [safety_checker, dev, schnell] on the GPU.

Lazy Loading Strategy Table

Step Inputs Action GPU Models
Initial Setup N/A Setup models [safety_checker, dev, schnell]
First Command - mask: ...
- image: ...
- model: "dev"
Unload schnell, load dev_inpaint [safety_checker, dev, dev_inpaint]
Second Command - mask: ...
- image: ...
- model: "schnell"
Unload dev_inpaint, load schnell_inpaint [safety_checker, dev, schnell_inpaint]
Third Command - mask: ...
- image: ...
- model: "dev"
Unload schnell_inpaint, load dev_inpaint [safety_checker, dev, dev_inpaint]
Fourth Command - prompt: ...
- model: "dev"
Return to non-inpainting mode (reload schnell) [safety_checker, dev, schnell]

Explanation of Updates

  • Initial Setup:

    • Start with [safety_checker, dev, schnell] on the GPU.
  • First Command:

    • Action: Inpainting with dev.
    • Model Swap: Unload schnell to free space, load dev_inpaint.
    • GPU Models: [safety_checker, dev, dev_inpaint]
  • Second Command:

    • Action: Inpainting with schnell.
    • Model Swap: Unload dev_inpaint, load schnell_inpaint.
    • GPU Models: [safety_checker, dev, schnell_inpaint]
  • Third Command:

    • Action: Inpainting with dev again.
    • Model Swap: Unload schnell_inpaint, reload dev_inpaint.
    • GPU Models: [safety_checker, dev, dev_inpaint]
  • Fourth Command:

    • Action: Text-to-image with dev (no inpainting).
    • Model Swap: Return to initial state by unloading dev_inpaint and reloading schnell.
    • GPU Models: [safety_checker, dev, schnell]

    I've tested it a lot with sudo cog run -p 5000 python -m cog.server.http on one terminal
    and a bunch of API calls on another terminal
    and we do not get OOMs (on an A100 80GB) 🎉

@zsxkib
Copy link
Contributor Author

zsxkib commented Aug 30, 2024

Logs to see timings⏳:

Loading safety checker...
Loading Flux dev pipeline
Loading Flux schnell pipeline
...
[!] Loaded models: ['safety_checker', 'dev', 'schnell']
setup took:  87.8359432220459
Using seed: 61448
Prompt: small cute cat sat on a park bench
[~] Configuring active model: dev, inpaint: True
[~] Moving schnell to CPU...
[!] Moved schnell to CPU in 16.86s
[~] Loading dev_inpaint pipeline...
[~] Creating new dev inpaint pipelin
...
[~] Created dev inpaint pipeline in 5.77s
[!] Moved model components to CUDA in 0.01s
[!] Loaded models: ['safety_checker', 'dev', 'dev_inpaint']
[!] Total time for configure_active_model: 22.64s
inpaint mode
Using dev model for inpainting
Using dev model
...
Using seed: 39676
Prompt: small cute cat sat on a park bench
[~] Configuring active model: schnell, inpaint: True
[~] Moving dev_inpaint to CPU...
[!] Moved dev_inpaint to CPU in 16.50s
[~] Loading schnell_inpaint pipeline...
[~] Creating new schnell inpaint pipelin
...
[~] Created schnell inpaint pipeline in 5.58s
[!] Moved model components to CUDA in 0.01s
[!] Loaded models: ['safety_checker', 'dev', 'schnell_inpaint']
[!] Total time for configure_active_model: 22.09s
inpaint mode
Using schnell model for inpainting
Using schnell model
...
Using seed: 29409
Prompt: small cute cat sat on a park bench
[~] Configuring active model: dev, inpaint: True
[~] Moving schnell_inpaint to CPU...
[!] Moved schnell_inpaint to CPU in 16.67s
[~] Loading dev_inpaint pipeline...
[!] Moved dev inpaint to CUDA in 4.21s
[!] Moved model components to CUDA in 0.01s
[!] Loaded models: ['safety_checker', 'dev', 'dev_inpaint']
[!] Total time for configure_active_model: 20.90s
inpaint mode
Using dev model for inpainting
Using dev model
...
Using seed: 61274
Prompt: small cute cat sat on a park bench
[~] Configuring active model: dev, inpaint: False
[~] Moving dev_inpaint to CPU...
[!] Moved dev_inpaint to CPU in 14.15s
[~] Moving schnell model to CUDA...
[!] Moved schnell to CUDA in 4.04s
[!] Moved model components to CUDA in 0.01s
[!] Loaded models: ['safety_checker', 'dev', 'schnell']
[!] Total time for configure_active_model: 18.20s
txt2img mode
Using dev model
...

@zsxkib zsxkib merged commit 290e81a into main Aug 30, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants