Replicate Init 2.0

zsxkib · zsxkib · commit 4c64b28a988e · 2024-01-23T16:31:15.000Z
Replicate READMEs
diff --git a/.gitignore b/.gitignore
@@ -158,3 +158,6 @@ cython_debug/
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
+
+# Cog
+.cog
diff --git a/README.md b/README.md
@@ -3,6 +3,7 @@
 <a href='https://arxiv.org/abs/2401.07519'><img src='https://img.shields.io/badge/Technique-Report-red'></a> 
 <a href='https://huggingface.co/papers/2401.07519'><img src='https://img.shields.io/static/v1?label=Paper&message=Huggingface&color=orange'></a> 
 <a href='https://huggingface.co/spaces/InstantX/InstantID'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> 
+[![Replicate](https://replicate.com/zsxkib/instant-id/badge)](https://replicate.com/zsxkib/instant-id)
 
 **InstantID : Zero-shot Identity-Preserving Generation in Seconds**
 
diff --git a/cog.yaml b/cog.yaml
@@ -0,0 +1,31 @@
+# Configuration for Cog ⚙️
+# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md
+
+build:
+  # set to true if your model requires a GPU
+  gpu: true
+  # cuda: "12.1"
+
+  # a list of ubuntu apt packages to install
+  system_packages:
+    - "libgl1-mesa-glx"
+    - "libglib2.0-0"
+
+  # python version in the form '3.11' or '3.11.4'
+  python_version: "3.11"
+
+  # a list of packages in the format <package-name>==<version>
+  python_packages:
+    - "opencv-python==4.9.0.80"
+    - "transformers==4.37.0"
+    - "accelerate==0.26.1"
+    - "insightface==0.7.3"
+    - "diffusers==0.25.1"
+    - "onnxruntime==1.16.3"
+
+  # commands run after the environment is setup
+  run:
+    - curl -o /usr/local/bin/pget -L "https://github.com/replicate/pget/releases/download/v0.6.0/pget_linux_x86_64" && chmod +x /usr/local/bin/pget
+
+# predict.py defines how predictions are run on your model
+predict: "cog/predict.py:Predictor"
diff --git a/cog/README.md b/cog/README.md
@@ -0,0 +1,60 @@
+# InstantID Cog Model
+
+[![Replicate](https://replicate.com/zsxkib/instant-id/badge)](https://replicate.com/zsxkib/instant-id)
+
+## Overview
+This repository contains the implementation of [InstantID](https://github.com/InstantID/InstantID) as a [Cog](https://github.com/replicate/cog) model. 
+
+Using [Cog](https://github.com/replicate/cog) allows any users with a GPU to run the model locally easily, without the hassle of downloading weights, installing libraries, or managing CUDA versions. Everything just works.
+
+## Development
+To push your own fork of InstantID to [Replicate](https://replicate.com), follow the [Model Pushing Guide](https://replicate.com/docs/guides/push-a-model).
+
+## Basic Usage
+To make predictions using the model, execute the following command from the root of this project:
+
+```bash
+cog predict \
+-i image=@examples/sam_resize.png \
+-i prompt="analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality" \
+-i negative_prompt="nsfw" \
+-i width=680 \
+-i height=680 \
+-i ip_adapter_scale=0.8 \
+-i controlnet_conditioning_scale=0.8 \
+-i num_inference_steps=30 \
+-i guidance_scale=5
+```
+
+<table>
+  <tr>
+    <td>
+      <p align="center">Input</p>
+      <img src="https://replicate.delivery/pbxt/KGy0R72cMwriR9EnCLu6hgVkQNd60mY01mDZAQqcUic9rVw4/musk_resize.jpeg" alt="Sample Input Image" width="90%"/>
+    </td>
+    <td>
+      <p align="center">Output</p>
+      <img src="https://replicate.delivery/pbxt/oGOxXELcLcpaMBeIeffwdxKZAkuzwOzzoxKadjhV8YgQWk8IB/result.jpg" alt="Sample Output Image" width="100%"/>
+    </td>
+  </tr>
+</table>
+
+## Input Parameters
+
+The following table provides details about each input parameter for the `predict` function:
+
+| Parameter                       | Description                        | Default Value                                                                                                  | Range       |
+| ------------------------------- | ---------------------------------- | -------------------------------------------------------------------------------------------------------------- | ----------- |
+| `image`                         | Input image                        | A path to the input image file                                                                                 | Path string |
+| `prompt`                        | Input prompt                       | "analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, ... " | String      |
+| `negative_prompt`               | Input Negative Prompt              | (empty string)                                                                                                 | String      |
+| `width`                         | Width of output image              | 640                                                                                                            | 512 - 2048  |
+| `height`                        | Height of output image             | 640                                                                                                            | 512 - 2048  |
+| `ip_adapter_scale`              | Scale for IP adapter               | 0.8                                                                                                            | 0.0 - 1.0   |
+| `controlnet_conditioning_scale` | Scale for ControlNet conditioning  | 0.8                                                                                                            | 0.0 - 1.0   |
+| `num_inference_steps`           | Number of denoising steps          | 30                                                                                                             | 1 - 500     |
+| `guidance_scale`                | Scale for classifier-free guidance | 5                                                                                                              | 1 - 50      |
+
+This table provides a quick reference to understand and modify the inputs for generating predictions using the model.
+
+
diff --git a/cog/predict.py b/cog/predict.py
@@ -0,0 +1,200 @@
+# Prediction interface for Cog ⚙️
+# https://github.com/replicate/cog/blob/main/docs/python.md
+
+import os
+import sys
+
+import time
+import subprocess
+from cog import BasePredictor, Input, Path
+
+import cv2
+import torch
+import numpy as np
+from PIL import Image
+
+from diffusers.utils import load_image
+from diffusers.models import ControlNetModel
+
+from insightface.app import FaceAnalysis
+
+sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
+from pipeline_stable_diffusion_xl_instantid import (
+    StableDiffusionXLInstantIDPipeline,
+    draw_kps,
+)
+
+# for `ip-adaper`, `ControlNetModel`, and `stable-diffusion-xl-base-1.0`
+CHECKPOINTS_CACHE = "./checkpoints"
+CHECKPOINTS_URL = (
+    "https://weights.replicate.delivery/default/InstantID/checkpoints.tar"
+)
+
+# for `models/antelopev2`
+MODELS_CACHE = "./models"
+MODELS_URL = "https://weights.replicate.delivery/default/InstantID/models.tar"
+
+
+def resize_img(
+    input_image,
+    max_side=1280,
+    min_side=1024,
+    size=None,
+    pad_to_max_side=False,
+    mode=Image.BILINEAR,
+    base_pixel_number=64,
+):
+    w, h = input_image.size
+    if size is not None:
+        w_resize_new, h_resize_new = size
+    else:
+        ratio = min_side / min(h, w)
+        w, h = round(ratio * w), round(ratio * h)
+        ratio = max_side / max(h, w)
+        input_image = input_image.resize([round(ratio * w), round(ratio * h)], mode)
+        w_resize_new = (round(ratio * w) // base_pixel_number) * base_pixel_number
+        h_resize_new = (round(ratio * h) // base_pixel_number) * base_pixel_number
+    input_image = input_image.resize([w_resize_new, h_resize_new], mode)
+
+    if pad_to_max_side:
+        res = np.ones([max_side, max_side, 3], dtype=np.uint8) * 255
+        offset_x = (max_side - w_resize_new) // 2
+        offset_y = (max_side - h_resize_new) // 2
+        res[
+            offset_y : offset_y + h_resize_new, offset_x : offset_x + w_resize_new
+        ] = np.array(input_image)
+        input_image = Image.fromarray(res)
+    return input_image
+
+
+def download_weights(url, dest):
+    start = time.time()
+    print("downloading url: ", url)
+    print("downloading to: ", dest)
+    subprocess.check_call(["pget", "-x", url, dest], close_fds=False)
+    print("downloading took: ", time.time() - start)
+
+
+class Predictor(BasePredictor):
+    def setup(self) -> None:
+        """Load the model into memory to make running multiple predictions efficient"""
+        if not os.path.exists(CHECKPOINTS_CACHE):
+            download_weights(CHECKPOINTS_URL, CHECKPOINTS_CACHE)
+
+        if not os.path.exists(MODELS_CACHE):
+            download_weights(MODELS_URL, MODELS_CACHE)
+
+        self.width, self.height = 640, 640
+        self.app = FaceAnalysis(
+            name="antelopev2",
+            root="./",
+            providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
+        )
+        self.app.prepare(ctx_id=0, det_size=(self.width, self.height))
+
+        # Path to InstantID models
+        face_adapter = f"./checkpoints/ip-adapter.bin"
+        controlnet_path = f"./checkpoints/ControlNetModel"
+
+        # Load pipeline
+        self.controlnet = ControlNetModel.from_pretrained(
+            controlnet_path,
+            torch_dtype=torch.float16,
+            cache_dir=CHECKPOINTS_CACHE,
+            local_files_only=True,
+        )
+
+        base_model_path = "stabilityai/stable-diffusion-xl-base-1.0"
+        self.pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
+            base_model_path,
+            controlnet=self.controlnet,
+            torch_dtype=torch.float16,
+            cache_dir=CHECKPOINTS_CACHE,
+            local_files_only=True,
+        )
+        self.pipe.cuda()
+        self.pipe.load_ip_adapter_instantid(face_adapter)
+
+    def predict(
+        self,
+        image: Path = Input(description="Input image"),
+        prompt: str = Input(
+            description="Input prompt",
+            default="analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality",
+        ),
+        negative_prompt: str = Input(
+            description="Input Negative Prompt",
+            default="",
+        ),
+        width: int = Input(
+            description="Width of output image",
+            default=640,
+            ge=512,
+            le=2048,
+        ),
+        height: int = Input(
+            description="Height of output image",
+            default=640,
+            ge=512,
+            le=2048,
+        ),
+        ip_adapter_scale: float = Input(
+            description="Scale for IP adapter",
+            default=0.8,
+            ge=0,
+            le=1,
+        ),
+        controlnet_conditioning_scale: float = Input(
+            description="Scale for ControlNet conditioning",
+            default=0.8,
+            ge=0,
+            le=1,
+        ),
+        num_inference_steps: int = Input(
+            description="Number of denoising steps",
+            default=30,
+            ge=1,
+            le=500,
+        ),
+        guidance_scale: float = Input(
+            description="Scale for classifier-free guidance",
+            default=5,
+            ge=1,
+            le=50,
+        ),
+    ) -> Path:
+        """Run a single prediction on the model"""
+        if self.width != width or self.height != height:
+            print(f"[!] Resizing output to {width}x{height}")
+            self.width = width
+            self.height = height
+            self.app.prepare(ctx_id=0, det_size=(self.width, self.height))
+
+        face_image = load_image(str(image))
+        face_image = resize_img(face_image)
+
+        face_info = self.app.get(cv2.cvtColor(np.array(face_image), cv2.COLOR_RGB2BGR))
+        face_info = sorted(
+            face_info,
+            key=lambda x: (x["bbox"][2] - x["bbox"][0]) * (x["bbox"][3] - x["bbox"][1]),
+            reverse=True,
+        )[
+            0
+        ]  # only use the maximum face
+        face_emb = face_info["embedding"]
+        face_kps = draw_kps(face_image, face_info["kps"])
+
+        self.pipe.set_ip_adapter_scale(ip_adapter_scale)
+        image = self.pipe(
+            prompt=prompt,
+            negative_prompt=negative_prompt,
+            image_embeds=face_emb,
+            image=face_kps,
+            controlnet_conditioning_scale=controlnet_conditioning_scale,
+            num_inference_steps=num_inference_steps,
+            guidance_scale=guidance_scale,
+        ).images[0]
+
+        output_path = "result.jpg"
+        image.save(output_path)
+        return Path(output_path)