Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about inference mode #86

Open
Tengfei09 opened this issue Aug 2, 2023 · 2 comments
Open

Questions about inference mode #86

Tengfei09 opened this issue Aug 2, 2023 · 2 comments

Comments

@Tengfei09
Copy link

Tengfei09 commented Aug 2, 2023

Hi,
I'm trying to use your wonderful framework to do inference only. However, I'm not familiar with serving-related settings in your code. How to remove them? or change a bit of code?

By the way, after dumping the HLO graph, I found that the datatype is still fp32 even though I have changed the datatype option.

python -m EasyLM.models.llama.llama_serve \
    --load_llama_config='7b' \
    --load_checkpoint="params::/data/hantengfei/llama2_7b/open_llama_7b_v2_easylm" \
    --tokenizer.vocab_file='/data/hantengfei/llama2_7b/tokenizer.model' \
    --mesh_dim='1,-1,1' \
    --dtype='fp16' \
    --input_length=1024 \
    --seq_length=2048 \
    --lm_server.batch_size=4 \
    --lm_server.port=35009 \
    --lm_server.pre_compile='generate'
@young-geng
Copy link
Owner

I'm not sure if I understand which part you want to remove. The serving script basically implements the inference methods defined in the LMServer class. If you don't want to use the HTTP server you can easily modify the llama_serve.py to directly call those methods without spinning up a HTTP server.

@Tengfei09
Copy link
Author

Ok, Got it. Thanks for your reply.

By the way, How to change the datatype of the whole model? As I said before, After setting the option --dtype='fp16' , I still found that some gemm ops run in fp32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants