Replies: 2 comments
-
I agree with you. There are potential improvements that need to be made. One thing I would like to highlight is the text splitting process that is being used is currently based on English language. Default settings are probably not good for other languages. Probably play around with chunk size and default etc. |
Beta Was this translation helpful? Give feedback.
-
Yes, text splitting process is one the problems for Chinese. Besides, I think the main problem comes from the performance of LLM. Actually, my hardware is limited, so, no larger models I can try. Thank you so much for your contribution in localGPT! |
Beta Was this translation helpful? Give feedback.
-
Hi,
I'd like to ask if anyone really get correct and stable performance for almost questions you asked (large local data, tested several times)?
In fact, my local data is a text file with around 150k lines in Chinese.
I used Baichuan2-13b-chat for LLM and bge-large-zh-v1.5 for embedding model.
However, when I tried to ask questions related to my local data, I got the following issues:
Ask the same questions for different times, the answer is not the same (for example, information about number values are totally different)
Answer is poor compared with the "context" (extracted from similarity search), miss a lot of important information. Also, sometimes, information is totally wrong.
I mean, the performance is not capable for industry
If anyone still can get good performance, please share with me.
I'd appreciate about your help!
Beta Was this translation helpful? Give feedback.
All reactions