[Usage]: Is there a way to control default thinking behaviour of a model?

### Your current environment

Is there a way to control default thinking behaviour for models deployed through vllm.
As per https://docs.vllm.ai/en/stable/features/reasoning_outputs.html,
IBM Grantie 3.2 reasoning is disabled by default.
Qwen3, GLM 4.6, Deepseek V3.1 all have reasoning enabled by default.
It would be great if there is a way to control this from vllm.
--override-generation-config allows user to override temperature and other params at deployment.
But this does not work for reasoning.
I have tried
`docker run -d  --runtime nvidia -e TRANSFORMERS_OFFLINE=1  -e DEBUG="true"   -p 8000:8000   --ipc=host   vllm/vllm-openai:v0.11.0   --reasoning-parser qwen3 --model Qwen/Qwen3-4B --override-generation-config '{"chat_template_kwargs": {"enable_thinking": false}}'`

### How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: Is there a way to control default thinking behaviour of a model? #28070

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: Is there a way to control default thinking behaviour of a model? #28070

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions