Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Replaced NIM architecture diagram with self-made NIM on EKS arch diagram #612

Merged
merged 1 commit into from
Aug 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 2 additions & 4 deletions website/docs/gen-ai/inference/nvidia-nim-llama3.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,6 @@ NIM abstracts away model inference internals such as execution engine and runtim

NIMs are packaged as container images on a per model/model family basis. Each NIM container is with a model, such as `meta/llama3-8b-instruct`. These containers include a runtime that runs on any NVIDIA GPU with sufficient GPU memory, but some model/GPU combinations are optimized. NIM automatically downloads the model from NVIDIA NGC Catalog, leveraging a local filesystem cache if available.

![NIM Architecture](img/nim-architecture.png)

Source: https://docs.nvidia.com/nim/large-language-models/latest/introduction.html#architecture

## Overview of this deployment pattern on Amazon EKS

This pattern combines the capabilities of NVIDIA NIM, Amazon Elastic Kubernetes Service (EKS), and various AWS services to deliver a high-performance and cost-optimized model serving infrastructure.
Expand All @@ -48,6 +44,8 @@ This pattern combines the capabilities of NVIDIA NIM, Amazon Elastic Kubernetes

By combining these components, our proposed solution delivers a powerful and cost-effective model serving infrastructure tailored for large language models. With NVIDIA NIM's seamless integration, Amazon EKS's scalability with Karpenter, customers can achieve high performance while minimizing infrastructure costs.

![NIM on EKS Architecture](img/nim-on-eks-arch.png)

## Deploying the Solution

### Prerequisites
Expand Down
Loading