Add musa_simple Dockerfile for supporting Moore Threads GPU #1842

yeahdongcn · 2024-11-25T11:17:11Z

Testing Done

Build musa_simple docker image locally -> pass

Run musa_simple container to serve llama3.2_1b_q8_0.gguf -> pass

❯ docker run --net=host --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/models/llama3.2_1b_q8_0.gguf -e N_GPU_LAYERS=999 -v $HOME/models:/models -it musa_simple

Access API server with curl -> pass

❯ curl -X 'POST'   'http://localhost:8000/v1/completions'   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "prompt": "\n\n### Instructions:\nWhat is the capital of France?\n\n### Response:\n",
  "stop": [
    "\n",
    "###"
  ]
}'
{"id":"cmpl-cf14c7fb-17d6-494a-97ef-8353492625ed","object":"text_completion","created":1732532988,"model":"/models/llama3.2_1b_q8_0.gguf","choices":[{"text":"Paris","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":2,"total_tokens":17}}%

Please refer to the attached screenshot for additional details.

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added 4 commits November 25, 2024 18:47

Fix typo

22d51d4

Signed-off-by: Xiaodong Ye <[email protected]>

Add musa_simple Dockerfile for supporting Moore Threads GPU

d455817

Signed-off-by: Xiaodong Ye <[email protected]>

README.md: Add MUSA as supported backend

0a8b764

Signed-off-by: Xiaodong Ye <[email protected]>

Set MUSA_DOCKER_ARCH=default

b69902d

Signed-off-by: Xiaodong Ye <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add musa_simple Dockerfile for supporting Moore Threads GPU #1842

Add musa_simple Dockerfile for supporting Moore Threads GPU #1842

yeahdongcn commented Nov 25, 2024

Add musa_simple Dockerfile for supporting Moore Threads GPU #1842

Are you sure you want to change the base?

Add musa_simple Dockerfile for supporting Moore Threads GPU #1842

Conversation

yeahdongcn commented Nov 25, 2024

Testing Done