Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langchain中Xinference Chat的支持 #1510

Closed
buptzyf opened this issue May 17, 2024 · 4 comments
Closed

langchain中Xinference Chat的支持 #1510

buptzyf opened this issue May 17, 2024 · 4 comments
Labels
question Further information is requested
Milestone

Comments

@buptzyf
Copy link
Contributor

buptzyf commented May 17, 2024

@codingl2k1 您好:

  1. 我看到这个pr被关了https://github.com/langchain-ai/langchain/pull/12702,所以后续还继续提交pr么
  2. 这个是没有使用chat么?
    llm = Xinference(
    server_url="server_url",
    model_uid="qwen1.5-chat-14B", # replace model_uid with the model UID return from launching the model,
    temperature=0.1, max_tokens=30 * 1024, stream=False, verbose=True
    )

这个是源码:
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
"""Call the xinference model and return the output.

    Args:
        prompt: The prompt to use for generation.
        stop: Optional list of stop words to use when generating.
        generate_config: Optional dictionary for the configuration used for
            generation.

    Returns:
        The generated string by the model.
    """
    model = self.client.get_model(self.model_uid)

    generate_config: "LlamaCppGenerateConfig" = kwargs.get("generate_config", {})

    generate_config = {**self.model_kwargs, **generate_config}

    if stop:
        generate_config["stop"] = stop

    if generate_config and generate_config.get("stream"):
        combined_text_output = ""
        for token in self._stream_generate(
            model=model,
            prompt=prompt,
            run_manager=run_manager,
            generate_config=generate_config,
        ):
            combined_text_output += token
        return combined_text_output

    else:
        completion = model.generate(prompt=prompt, generate_config=generate_config)
        return completion["choices"][0]["text"]

def _stream_generate(
    self,
    model: Union["RESTfulGenerateModelHandle", "RESTfulChatModelHandle"],
    prompt: str,
    run_manager: Optional[CallbackManagerForLLMRun] = None,
    generate_config: Optional["LlamaCppGenerateConfig"] = None,
) -> Generator[str, None, None]:
    """
    Args:
        prompt: The prompt to use for generation.
        model: The model used for generation.
        stop: Optional list of stop words to use when generating.
        generate_config: Optional dictionary for the configuration used for
            generation.

    Yields:
        A string token.
    """
    streaming_response = model.generate(
        prompt=prompt, generate_config=generate_config
    )
    for chunk in streaming_response:
        if isinstance(chunk, dict):
            choices = chunk.get("choices", [])
            if choices:
                choice = choices[0]
                if isinstance(choice, dict):
                    token = choice.get("text", "")
                    log_probs = choice.get("logprobs")
                    if run_manager:
                        run_manager.on_llm_new_token(
                            token=token, verbose=self.verbose, log_probs=log_probs
                        )
                    yield token

里面没有支持model.chat,所以现在chat只能用原生的xinference.Client?

@buptzyf buptzyf added the question Further information is requested label May 17, 2024
@XprobeBot XprobeBot added this to the v0.11.1 milestone May 17, 2024
@buptzyf buptzyf changed the title langchain中ChatModel的支持 langchain中Xinference Chat的支持 May 17, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.1, v0.11.2 May 17, 2024
@codingl2k1
Copy link
Contributor

Xinference 是兼容 OpenAI 的 API,你可以在 LangChain使用 OpenAI API来访问 Xinference。

@buptzyf
Copy link
Contributor Author

buptzyf commented May 21, 2024

Xinference 是兼容 OpenAI 的 API,你可以在 LangChain使用 OpenAI API来访问 Xinference。

谢谢,根据您的提示,我调起来了text-generation和chat的接口,有个新的问题,不知道是不是bug

客户端:

llm = OpenAI(model="qwen1.5-chat-14B", temperature=0.9, max_tokens=30 * 1024, streaming=True)
query_result = llm.invoke(input="你的名字", temperature=0.9, max_tokens=30 * 1024, logit_bias=None, stream=True)

这里需要指定logit_bias=None(也可能是langchain的问题)

Xinference 服务端:
image

服务端这边判断的是None,如果我不显式指定logit_bias为None,就会返回给客户端501

@codingl2k1
Copy link
Contributor

目前 logit_bias 还没实现,所以如果传了值会提示 501 错误。

@buptzyf
Copy link
Contributor Author

buptzyf commented May 22, 2024

目前 logit_bias 还没实现,所以如果传了值会提示 501 错误。

感谢回复

@buptzyf buptzyf closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants