Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend endpointOai.ts to allow usage of extra sampling parameters #1032

Merged
merged 8 commits into from May 8, 2024

Conversation

taeminlee
Copy link
Contributor

Description

This pull request introduces the ability to utilize the full range of sampling parameters when using an OpenAI-compatible API (e.g. VLLM). OpenAI's API currently supports a limited set of sampling parameters, which restricts the full capabilities of the open LLM model. By adding additional parameters such as best_of and top_k in the request body, we can leverage open LLM's enhanced inference performance.

For instance, VLLM supports several advanced sampling parameters that can be utilized to enhance model performance.
Detailed descriptions and usage instructions for these parameters can be found in thethe official documentation: VLLM Extra Parameters.

Code Changes

The modifications include an update to the existing JavaScript library for OpenAI where the model configuration now accepts an extra_body object. This object is then merged with the standard body parameters during the API call.
The reason for naming it extra_body is because this variable name is used in Python's OpenAI library.

Below is an example of how to configure these parameters in the model:

"parameters": {
  "temperature": 0.4,
  "max_new_tokens": 1024,
  "stop": [],
  "extra_body": {
    "top_k": 50
  }
},
"endpoints": [{
  "type": "openai",
  "completion": "completions",
  "baseURL": "http://localhost:51086/v1"
}]

Impact

This enhancement allows users of the OpenAI-compatible server to fully exploit the model's capabilities, improving the quality and flexibility of generated responses.

@nsarrazin nsarrazin self-requested a review April 22, 2024 07:21
@taeminlee
Copy link
Contributor Author

@nsarrazin I apologize for forgetting the lint check. I've run Prettier and confirmed that it passes the run lint locally. Please review. :)

@taeminlee
Copy link
Contributor Author

taeminlee commented Apr 25, 2024

@nsarrazin I've run npm run check and fixed the type warnings. Apologies for not catching all the issues in one go. Please let me know if there's anything else that needs attention. Thank you.

@nsarrazin
Copy link
Collaborator

Hey! Thanks for the contrib, sorry for the review delay, finally took a look at it. 😅

I looked at the python client you linked to
, looks like extra_body is next to extra_headers or extra_query in there.

So I thought we could move it in chat-ui from parameters.extra_body in order to have it next to defaultHeaders and defaultQuery ? (should probably rename these...)

I pushed a commit that does this and simplifies the code a bit too. I'm not super familiar with the open AI client, is it fine to pass the extended body under the options like this?

Let me know if this commit works for you, otherwise we can change things 😄

@taeminlee
Copy link
Contributor Author

taeminlee commented May 7, 2024

thank you for your response.

Firstly, regarding the OpenAI node library, the data type of the body is defined through interfaces such as ChatCompletionCreateParamsBase.

Reference: GitHub - OpenAI Node Library

create(
    body: ChatCompletionCreateParamsStreaming,
    options?: Core.RequestOptions,
  ): APIPromise<Stream<ChatCompletionChunk>>;

These interfaces define permissible hyperparameters. Thus, if parameters not permitted by the interface, such as top_k, are added to the body, they are omitted.

Reference: GitHub - OpenAI Node Library

Conclusively, the current simplified code will result in additional parameters (e.g. top-k) being omitted. To prevent parameters from being omitted, I utilized the options. Since the body specified in the options is not managed by the interface, it allows for additional parameters to be included.

This approach resulted in a somewhat complex code. I wished to simplify it further, but due to the reasons mentioned, it was challenging to make it simpler.

I am curious if it is possible to merge using the code before simplification.

@nsarrazin
Copy link
Collaborator

Hi, trying to understand the difference between the two versions here

You used to do

const combinedBody = {...body, ...parameters.extra_body};
openChatAICompletion = await openai.chat.completions.create(body, {body: combinedBody});

In my commit I switched to

const openChatAICompletion = await openai.chat.completions.create(body, {
  body: { ...body, ...extraBody },
});

which should be equivalent? I am passing the full body + extraBody inside of options. 👀

Maybe I missed something? let me know if that's the case and I can fix/revert

@taeminlee
Copy link
Contributor Author

taeminlee commented May 8, 2024

Ah, you are indeed correct. I misunderstood your code initially. It's exactly the simplification I was looking for. You are a genius. I apologize for the confusion I caused.

I have tested the code you committed, and made one additional amendment to it.

For example, in vllm, the extraBody can contain various data types. (Reference: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-completions-api) The previous extraBody could only accept strings, which I found to be somewhat restrictive. I think it might be okay to relax it to any type.

Looking forward to your response. Thank you.

@nsarrazin
Copy link
Collaborator

No worries!😊 Looks good to me, thanks for contributing!

@nsarrazin nsarrazin merged commit 25d6df8 into huggingface:main May 8, 2024
3 checks passed
@jackzhouusa
Copy link

jackzhouusa commented May 9, 2024

How to send the extra parameters using extraBody?

For example:
`const endpoint = await model.getEndpoint();

for await (const output of await endpoint({
messages: processedMessages,
preprompt,
continueMessage: isContinue,
generateSettings: assistant?.generateSettings,
extraBody: {convId: convId.toString()}
})) {...}`

Any other ways?

@taeminlee
Copy link
Contributor Author

taeminlee commented May 10, 2024

How to send the extra parameters using extraBody?

For example: `const endpoint = await model.getEndpoint();

for await (const output of await endpoint({ messages: processedMessages, preprompt, continueMessage: isContinue, generateSettings: assistant?.generateSettings, extraBody: {convId: convId.toString()} })) {...}`

Any other ways?

I've configure the additional parameters for API endpoints similarly to the setup outlined in the README.md, utilizing the .env.local file. This method allows us to manage endpoint configurations centrally and include any necessary additional parameters directly within the extraBody.

For instance, if we need to add top_k as an additional parameter, we can easily set it in the extraBody of the endpoint configuration as follows:

MODELS=[
  {
    "name": "vllm",
    "id": "vllm",
    "parameters": {
      "temperature": 0.9,
      "top_p": 0.95,
      "max_new_tokens": 1024,
    },
    "endpoints": [{
      "type": "openai",
      "baseURL": "http://localhost:8000/v1",
      "extraBody": {"top_k": 50}
    }]
  }
]

This setup ensures that our API can receive and handle these parameters effectively, enhancing the model's functionality as specified.

@jackzhouusa
Copy link

I see. Thank you for your explanation.
For my use case, I need to send the convId to endpoint so I utilized extraBody for a workaround. It works fine. Thanks for your contribution.

taeminlee added a commit to taeminlee/chat-ui that referenced this pull request May 15, 2024
Fixed incorrect setup for extra parameters in OpenAI compatible server configuration (see PR huggingface#1032)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants