Qwen3 model #14

jlonge4 · 2025-04-30T18:17:06Z

Issue #, if available:
N/A
Description of changes:
Add Qwen3 model file and inference notebook. Tested with Qwen/Qwen3-8B

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jlonge4 · 2025-05-14T14:26:04Z

Logit Validation Benchmark Code:

!inference_demo \
    --model-type qwen3 \
    --task-type causal-lm \
    run \
    --model-path /home/ubuntu/model_hf_qwen/qwen/ \
    --compiled-model-path /home/ubuntu/traced_model_qwen/qwen/logit \
    --torch-dtype bfloat16 \
    --tp-degree 8 \
    --batch-size 1 \
    --max-context-length 16 \
    --seq-len 32 \
    --enable-bucketing \
    --pad-token-id 151645 \
    --prompt "To be, or not to be" \
    --check-accuracy-mode logit-matching \
    --benchmark

Results:

Expected Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Expected Logits Shape:  torch.Size([25, 1, 151936])
Actual Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Actual Logits Shape:  torch.Size([25, 1, 151936])
Passed logits validation!

Generating outputs...
Prompts: ['To be, or not to be']
Generated outputs:
Output 0: To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune

Benchmark completed and its result is as following
{
    "e2e_model": {
        "latency_ms_p50": 156.56781196594238,
        "latency_ms_p90": 158.08086395263672,
        "latency_ms_p95": 158.1140637397766,
        "latency_ms_p99": 158.28602075576782,
        "latency_ms_p100": 158.32901000976562,
        "latency_ms_avg": 156.99772834777832,
        "throughput": 203.82460521412273
    },
    "context_encoding_model": {
        "latency_ms_p50": 10.202646255493164,
        "latency_ms_p90": 10.224390029907227,
        "latency_ms_p95": 10.22493839263916,
        "latency_ms_p99": 10.226750373840332,
        "latency_ms_p100": 10.227203369140625,
        "latency_ms_avg": 10.201811790466309,
        "throughput": 1568.348870634151
    },
    "token_generation_model": {
        "latency_ms_p50": 8.858323097229004,
        "latency_ms_p90": 8.903312683105469,
        "latency_ms_p95": 9.238588809967041,
        "latency_ms_p99": 9.264287948608398,
        "latency_ms_p100": 9.28950309753418,
        "latency_ms_avg": 8.88296922047933,
        "throughput": 120.07996877975322
    }
}

jlonge4 · 2025-05-19T19:57:19Z

contributed/models/qwen3/qwen-3-test.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Thinking example"


@EmilyWebber that should do it : )

ValkyriaLenneth · 2025-05-27T00:53:54Z

@jlonge4
Thanks for your great work.
But I'm confusing about the transformers version.
Since neuronx-distributed need transformers==4.48, however the qwen3 need transformers>=4.51
How could you fix this problem
Thanks

jlonge4 added 7 commits April 9, 2025 16:38

add qwen2 support

7b3ae19

update qwen file and add test nb

c6b43cf

add qwen3

0eaad5c

lint

e1611a1

add inference nb

176ced2

Remove .DS_Store files and add to gitignore

4d683ea

logit val / cleanup

beabb8c

update with thinking example

4e16bca

jlonge4 commented May 19, 2025

View reviewed changes

jlonge4 mentioned this pull request May 20, 2025

Support Qwen3 huggingface/optimum-neuron#847

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3 model #14

Qwen3 model #14

Uh oh!

jlonge4 commented Apr 30, 2025

Uh oh!

jlonge4 commented May 14, 2025

Uh oh!

jlonge4 May 19, 2025

Uh oh!

ValkyriaLenneth commented May 27, 2025

Uh oh!

Uh oh!

Qwen3 model #14

Are you sure you want to change the base?

Qwen3 model #14

Uh oh!

Conversation

jlonge4 commented Apr 30, 2025

Uh oh!

jlonge4 commented May 14, 2025

Uh oh!

jlonge4 May 19, 2025

Choose a reason for hiding this comment

Uh oh!

ValkyriaLenneth commented May 27, 2025

Uh oh!

Uh oh!