diff --git a/libraries/nxd-inference/tutorials/llama3.2-multimodal-tutorial.rst b/libraries/nxd-inference/tutorials/llama3.2-multimodal-tutorial.rst index aee11469..523ffffc 100644 --- a/libraries/nxd-inference/tutorials/llama3.2-multimodal-tutorial.rst +++ b/libraries/nxd-inference/tutorials/llama3.2-multimodal-tutorial.rst @@ -28,7 +28,7 @@ Step 1: Set up Development Environment :: - source ~/aws_neuronx_venv_pytorch_2_5_nxd_inference/bin/activate + source /opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/bin/activate 3. Install the fork of vLLM (v0.6.x-neuron) that supports NxD Inference following :ref:`nxdi-vllm-user-guide`. @@ -338,6 +338,10 @@ You should receive outputs shown in the client terminal shortly: "usage":{"prompt_tokens":42,"total_tokens":50,"completion_tokens":8},"prompt_logprobs":null} + +If the request fails, try setting ``export VLLM_RPC_TIMEOUT=180000`` environment variable. The timeout value depends on the +model and deployment configuration used. + To send a request with both text and image prompts: ::