-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Nvidia Embedding Connector #10410
base: main
Are you sure you want to change the base?
Conversation
|
||
"""Specific settings for the text embedding endpoint.""" | ||
|
||
input: str | list[str] | list[int] | list[list[int]] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the OpenAI Embedding API supports list[int] and list[list[int]] to accept pre-tokenized input. the NeMo Retriever Embedding API does not support this.
@raspawar please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is looking good. i've added some comments and questions.
...ic_kernel/connectors/ai/nvidia/prompt_execution_settings/nvidia_prompt_execution_settings.py
Outdated
Show resolved
Hide resolved
...ic_kernel/connectors/ai/nvidia/prompt_execution_settings/nvidia_prompt_execution_settings.py
Outdated
Show resolved
Hide resolved
"""Send a request to the OpenAI embeddings endpoint.""" | ||
try: | ||
# exclude input-type from main body | ||
response = await self.client.embeddings.create(**settings.prepare_settings_dict(exclude="input_type")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why exclude input_type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because openai client does not allow input_type parameter, so I excluded it from main dict and added in extra body
As well as any fields that are None. | ||
""" | ||
return self.model_dump( | ||
exclude={"service_id", "extension_data", "structured_json_response", "input_type"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the input_type need to be passed as extra_body
for i in range(0, len(texts), batch_size): | ||
batch = texts[i : i + batch_size] | ||
settings.input = batch | ||
raw_embedding = await self._send_request(settings=settings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like batch_size is controlling the number of texts that are sent for embedding in a single call to the service.
each of those batches can also be sent in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, batch size refers to how many characters are included in each call, if this can be parallelized, please do, also moving the actual embedding calls here makes this simpler (vs the handler, see earlier comment)
def to_dict(self) -> dict[str, str]: | ||
"""Create a dict of the service settings.""" | ||
client_settings = { | ||
"api_key": self.client.api_key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the security implications of including the api_key here?
(Env var NVIDIA_BASE_URL) | ||
- chat_model_id: The NVIDIA chat model ID to use see https://docs.api.nvidia.com/nim/reference/llm-apis. | ||
(Env var NVIDIA_CHAT_MODEL_ID) | ||
- text_model_id: str | None - The NVIDIA text model ID to use, for example, nvidia/nemotron-4-340b-reward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are also vision language models (VLMs), which take text or images as input and produce text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes will add when working on vlm support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those models are supported with the ChatCompletionClientBase as well
python/tests/unit/connectors/ai/nvidia/services/test_nvidia_text_embedding.py
Outdated
Show resolved
Hide resolved
model=ai_model_id, | ||
dimensions=embedding_dimensions, | ||
encoding_format="float", | ||
extra_body={"input_type": "passage"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this demonstrates there are two ways to pass input_type
. i suggest only providing one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right, as I am validating the request bosy and in that input_type
goes in extra_body
…nto raspawar/nvidia-chat-compl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of comments, most important is to simplify this by a lot!
class NvidiaModelTypes(Enum): | ||
"""Nvidia model types, can be text, chat or embedding.""" | ||
|
||
TEXT = "text" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't support these modalities, they shouldn't be in here (yet)
model: str = None | ||
encoding_format: Literal["float", "base64"] | None = "float" # default to float | ||
truncate: Literal[None, "NONE", "START", "END"] | None = None | ||
input_type: Literal["passage", "query"] | None = "passage" # default to passage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is preferred to set defaults to None if the service itself has default values, that way we don't get out of sync with the default documented by services
"""Specific settings for the text embedding endpoint.""" | ||
|
||
input: str | list[str] | list[int] | list[list[int]] | None = None | ||
model: str = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model
is a reserved term for pydantic, please use a alternative with a Alias, for instance for openai we use:
ai_model_id: Annotated[str | None, Field(serialization_alias="model")] = None
class NvidiaEmbeddingPromptExecutionSettings(NvidiaPromptExecutionSettings): | ||
"""Settings for NVIDIA embedding prompt execution.""" | ||
|
||
"""Specific settings for the text embedding endpoint.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please combine into one docstring
RESPONSE_TYPE = Union[list[Any],] | ||
|
||
|
||
class NvidiaHandler(KernelBaseModel, ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the goal to also add chat and or text completions or other modalities? If not, or if only one other, then we don't need all this extra stuff that we have for OpenAI since that support a lot of things and has 2 configs (azure and openai), we could simplify this PR by a lot and only extend the pieces that are shared later on.
# move input_type to extra-body | ||
if not settings.extra_body: | ||
settings.extra_body = {} | ||
settings.extra_body.setdefault("input_type", settings.input_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we move this logic into the prepare_settings_dict
?
for i in range(0, len(texts), batch_size): | ||
batch = texts[i : i + batch_size] | ||
settings.input = batch | ||
raw_embedding = await self._send_request(settings=settings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, batch size refers to how many characters are included in each call, if this can be parallelized, please do, also moving the actual embedding calls here makes this simpler (vs the handler, see earlier comment)
and more information refer https://docs.api.nvidia.com/nim/reference/ | ||
use endpoint if you only want to supply the endpoint. | ||
(Env var NVIDIA_BASE_URL) | ||
- chat_model_id: The NVIDIA chat model ID to use see https://docs.api.nvidia.com/nim/reference/llm-apis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since chat and text are not supported yet, please remove.
(Env var NVIDIA_BASE_URL) | ||
- chat_model_id: The NVIDIA chat model ID to use see https://docs.api.nvidia.com/nim/reference/llm-apis. | ||
(Env var NVIDIA_CHAT_MODEL_ID) | ||
- text_model_id: str | None - The NVIDIA text model ID to use, for example, nvidia/nemotron-4-340b-reward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those models are supported with the ChatCompletionClientBase as well
Nvidia Embedding Connector
Description
Contribution Checklist