Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support the OpenAI Batch Chat Completions file format #4777

Closed
wuisawesome opened this issue May 13, 2024 · 0 comments 路 Fixed by #4794
Closed

[Feature]: Support the OpenAI Batch Chat Completions file format #4777

wuisawesome opened this issue May 13, 2024 · 0 comments 路 Fixed by #4794

Comments

@wuisawesome
Copy link
Contributor

wuisawesome commented May 13, 2024

馃殌 The feature, motivation and pitch

I'm working on a use case that involves running a the same dataset/prompt across multiple models (including some OpenAI and some open source models). I would like to be able to do batch inference on many requests in a file that follow the OpenAI Batch file format.

  1. This follows the spirit/pattern of the popular openai api server interface for using vllm.
  2. It is easy to adapt existing code that calls web endpoints to generate these files (since the body field is essentially what you would pass into web endpoint).
  3. This format doesn't require the user to think about rate limits, parallelism, etc.

I'll lay out an implementation plan here, which I'm willing to contribute an implementation for.

Interface

The primary interface would be via CLI command.

$ python -m vllm.entrypoints.openai_batch --help
Usage: openai_batch [OPTIONS]

  Run offline inference on a file which conforms to the OpenAI Batch file format. https://platform.openai.com/docs/guides/batch/getting-started

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.
  -i
  --input-file The path or url to a single input file. Currently supports local file paths, or the http protocol (http or https). If a URL is specified, the file should be available via HTTP GET.
  -o
  --output-file The path or url to a single output file. Currently supports local file paths, or web (http or https) urls. If a URL is specified, the file should be available via HTTP PUT.

Exit status:
  0 No problems occured.
  1 Generic error code.

Implementation

The feature of this feature should be a fairly independent wrapper around the vLLM core, and the implementation shouldn't be very involved. I propose only a very minor cleanup to OpenAIServingChat in which OpenAIServingChat::create_chat_completion function signature will change to (the only change is the raw_request is removed):

    async def create_chat_completion(
        self, request: ChatCompletionRequest, is_aborted : Optional[Callable[[], Awaitable[bool]] = None
    ) -> Union[ErrorResponse, AsyncGenerator[str, None], ChatCompletionResponse]:

The only caller of the function (api_server.py) will simply pass is_aborted = raw_request.is_disconnected.

The rest of the implementation will be:

  1. Create a new pydantic model for the request.
  2. Load the local file or url into a list of request objects.
  3. Submit the requests to the openai_serving_chat.
  4. Write the outputs.

Alternatives

It's possible to simply instantiate the api server and manage parallelizing requests today, but it adds complication to user code since request parallelization is important and difficult to get right and ensuring the http server's port is available isn't always trivial in failure conditions.

Alternative apis:

  • Python API: Nothing in this proposal prevents or makes it more difficult to introduce a python api later, though there are additional interface decisions to make (async vs synchronous, exclusive vs shared engine, etc).
  • REST API: The openai REST API using an input_file_id field. It's not obvious to me how implementing this field in the same way as openai (with validation, etc) should interact with the rest of the vLLM project, so I will leave it out of scope for now (perhaps others can chip in if this is a desired feature).

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant