Skip to content

MohammadB88/k8s-image-generation

Repository files navigation

k8s-image-generation

End-to-end guide to deploying, scaling, and testing image generation models (e.g., Stable Diffusion, ControlNet) on Kubernetes using modern inference engines and serving frameworks like KServe and vLLM.

With the help of this repo, one could deploy a text-to-image generator model using sdxl custom runtime provided by RedHat as well as a client UI to generate images:

  • A S3 storage (minio) with a minimum of 100GB is reqruired.

  • Install odh-tech (deploy the custom workbench) using this image Open Data Hub Tools & Extensions Companion:

  • Add Huggingface token in the settings to authenticate to model registery

  • Download required components for "sdxl" model from HuggingFace:

  • install minio-cli (https://min.io/docs/minio/linux/reference/minio-mc.html) using below commands:

      curl https://dl.min.io/client/mc/release/linux-amd64/mc \
      --create-dirs \
      -o $HOME/minio-binaries/minio
    
      chmod +x /$HOME/minio-binaries/minio
      
      export PATH=$PATH:$HOME/minio-binaries/
    
      minio alias set myminio HOSTNAME ACCESS_KEY SECRET_KEY
    
      minio admin info myminio
  • mv only model file ('sd_xl_base_1.0.safetensors' and 'sd_xl_refiner_1.0.safetensors') from the corresponding directories to the path (/models/) in the S3 storage, so that it could be deployed using the custom runtime:

      minio cp myminio/sdxl-models/stabilityai/stable-diffusion-xl-base-1.0/sd_xl_base_1.0.safetensors myminio/sdxl-models/models/sd_xl_base_1.0.safetensors
    
      minio cp myminio/sdxl-models/stabilityai/stable-diffusion-xl-refiner-1.0/sd_xl_refiner_1.0.safetensors myminio/sdxl-models/models/sd_xl_refiner_1.0.safetensors
  • Add custom runtime kserve-sdxl to the Serving Runtimes in OpenshiftA

  • Deploy the model using custom runtime and the inferenceservice resources

  • Deploy the sdxl client UI adding 3 variables

    • SDXL_ENDPOINT_URL=
    • GUARD_ENABLED=false
    • SAFETY_CHECK_ENABLED=false
  • Test model-Endpoint using:

    • from the predictor pod itself:
    curl -X POST \
    -H "Content-Type: application/json" \
    -d '{
      "instances": [
        {
          "prompt": "grey tabby cat with green eyes hiding inside a cardboard box, photograph, nikon, detailed, 8k",
          "guidance_scale": 8.0,
          "num_inference_steps": 50,
          "crops_coords_top_left": [256, 0],
          "width": 1024,
          "height": 1024,
          "denoising_limit": 0.8
        }
      ]
    }' \
    http://localhost:8080/v1/models/model:predict
    
    • from the client pod:
    curl -X POST \
    -H "Content-Type: application/json" \
    -d '{
    "instances": [
      {
        "prompt": "grey tabby cat with green eyes hiding inside a cardboard box, photograph, nikon, detailed, 8k",
        "guidance_scale": 8.0,
        "num_inference_steps": 50,
        "crops_coords_top_left": [256, 0],
        "width": 1024,
        "height": 1024,
        "denoising_limit": 0.8
      }
    ]
    }' \
    http://sdxl-predictor.sdxl-studio.svc.cluster.local:8080/v1/models/model:predict
    curl -X POST \
    -H "Content-Type: application/json" \
    -d '{
    "instances": [
      {
        "prompt": "grey tabby cat with green eyes hiding inside a cardboard box, photograph, nikon, detailed, 8k",
        "guidance_scale": 8.0,
        "num_inference_steps": 50,
        "crops_coords_top_left": [256, 0],
        "width": 1024,
        "height": 1024,
        "denoising_limit": 0.8
      }
    ]
    }' \
    http://localhost:8888/

Ideal for MLOps practitioners building scalable, GPU-accelerated image generation services on Kubernetes.

About

End-to-end guide to deploying, scaling, and testing image generation models on Kubernetes using modern inference engines and serving frameworks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors