From ea2079d99deb42309e60935d565e71d5fcaa2e92 Mon Sep 17 00:00:00 2001 From: rohith Date: Tue, 7 Jan 2025 18:28:19 -0800 Subject: [PATCH] fix issues in Llama-3.2 MM tutorial --- .../tutorials/llama3.2-multimodal-tutorial.rst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/libraries/nxd-inference/tutorials/llama3.2-multimodal-tutorial.rst b/libraries/nxd-inference/tutorials/llama3.2-multimodal-tutorial.rst index aee11469..523ffffc 100644 --- a/libraries/nxd-inference/tutorials/llama3.2-multimodal-tutorial.rst +++ b/libraries/nxd-inference/tutorials/llama3.2-multimodal-tutorial.rst @@ -28,7 +28,7 @@ Step 1: Set up Development Environment :: - source ~/aws_neuronx_venv_pytorch_2_5_nxd_inference/bin/activate + source /opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/bin/activate 3. Install the fork of vLLM (v0.6.x-neuron) that supports NxD Inference following :ref:`nxdi-vllm-user-guide`. @@ -338,6 +338,10 @@ You should receive outputs shown in the client terminal shortly: "usage":{"prompt_tokens":42,"total_tokens":50,"completion_tokens":8},"prompt_logprobs":null} + +If the request fails, try setting ``export VLLM_RPC_TIMEOUT=180000`` environment variable. The timeout value depends on the +model and deployment configuration used. + To send a request with both text and image prompts: ::