Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect answer with openai compatible penalty parameters #238

Open
2 of 4 tasks
Spycsh opened this issue Oct 17, 2024 · 0 comments
Open
2 of 4 tasks

Incorrect answer with openai compatible penalty parameters #238

Spycsh opened this issue Oct 17, 2024 · 0 comments

Comments

@Spycsh
Copy link

Spycsh commented Oct 17, 2024

System Info

Hi there, I met a bug that when using TGI Gaudi 2.0.5 with both meta-llama/Meta-Llama-3-8B-Instruct and Intel/neural-chat-7b-v3-3. When I set the default frequency/repetition/presence penalty parameters based on the openai format(https://platform.openai.com/docs/api-reference/completions/create), I got wrong answers. Here are the screenshots:

image1

image2

I then checked it on TGI CPU and I did not encounter the bug, so I suspect there is something wrong with TGI Gaudi. Could you please look at this issues?

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Here is a minimum reproduction

model=Intel/neural-chat-7b-v3-3
hf_token=xxxx
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all \
 -e PT_HPU_LAZY_MODE=0 -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
 -e HF_TOKEN=$hf_token --cap-add=sys_nice --ipc=host -e http_proxy=${http_proxy} -e https_proxy=${https_proxy} \
 ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id $model --max-input-tokens 1024 --max-total-tokens 2048
http_proxy= curl http://${host_ip}:8081/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "tgi",
    "messages": [
      {
        "role": "user",
        "content": "What is deep Learning!"
      }
    ], "max_tokens":128,"temperature":0.01, "top_p":0.95, "frequency_penalty":0.0, "repetition_penalty":1.03, "presence_penalty":0.0 }'

The answer (missing spaces between words in the end):

{"id":"","object":"text_completion","created":1729153043,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.0.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning refers to a subset of machinelearning techniques that use artificial neural networks (ANNs) with multiple layers for feature extraction and transformation. These algorithms are designed based on the structure, functioningsimilarityto human brain's neuronsand their connections in order tomimethe processof how humans learn from data by recognizing patterns within it without explicit programming rules or instructions being given beforehand; this makes them highly effective at handling complex tasks like image recognitionor natural language processing(NLP). The deeper these network structures get - meaning more hiddenlayers-themorecomplexpatternsthatcanbelearnedfromdataarepossible"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":128,"total_tokens":133}}

Then I remove the repetition_penalty, only keep openai compatible frequency_penalty, presence_penalty

http_proxy= curl http://${host_ip}:8081/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "tgi",
    "messages": [
      {
        "role": "user",
        "content": "What is deep Learning!"
      }
    ], "max_tokens":128,"temperature":0.01, "top_p":0.95, "frequency_penalty":0.0, "presence_penalty":0.0 }'

Still error:

{"id":"","object":"text_completion","created":1729153206,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.0.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning refers to a subset of machinelearning techniques that use artificial neural networks (ANNs) with multiple layers for feature extraction and transformation. These algorithms are designed based on the structure, functioningsimilarityto human brain's neuronsand their connections in order tomimethe processof how humans learn from data or information through experience by recognizing patterns within large datasets without explicit programming rules defined beforehand; this allows them tounderstandcomplexrelationshipsbetween variables more effectively than traditionalmachine-learninglearningalgorithmswhich relyonlinearmodelsorrulebasedapproachesforpatternrecognition tasks such as image classification"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":128,"total_tokens":133}}

Expected behavior

The answer should be well-formatted and correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant