Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Chunk & save task gen process? Auto execute option? #invalid JSON #local LLM #914

Open
mcchung52 opened this issue May 6, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@mcchung52
Copy link

mcchung52 commented May 6, 2024

Version

Command-line (Python) version

Suggestion

Using LM Studio, local LLM is stuck in task generation phase due to invalid JSON response. After spending a lot of time generating since task list is long, output produced is mostly correct except couple syntax error.

  1. Is there a way for gpt-pilot to break up this process so that it checks for JSON after each task is generated and saves, and not after all is done so all that work is not lost? Not sure if this is the way to solve but it's the immediate thing I can think of. Also, due to my constraint, I'm more sensitive to its efficiency.
  2. I looked up enforcing output and came across Guidance and LMQL. How about if we use these?
  3. Since it takes long for me, I usually run it and go to sleep or long breaks. Can I somehow specify for it to run automatically?

If you can also point me to right places, I can try to create a PR. Thanks in advance for your help~

Background:
I'm using local LLM and NOT OpenAI due to cost and privacy issues, so I need to make this work for local LLM, which kinda introduces problems, I guess, that are not happening w/ OpenAI. Also, generation is slow but MVG (minimum viable goal) is to see if this works no matter how long it takes.

@mcchung52 mcchung52 added the enhancement New feature or request label May 6, 2024
@phalexo
Copy link

phalexo commented May 6, 2024

I am running gpt-pilot with Llama-3-70B-Instruct.Q5_K_M It does not seem to suffer from malformed JSON issues, although it has other problems.

I changed the source to have 3000 timeout for reading instead of 300 and changed the code to use Llama tokenizer instead of tiktoken from OpenAI, Llama tokenizer uses tiktoken internally, so they are pretty close.

@mcchung52
Copy link
Author

Thanks for sharing that. I guess I'd need a memory upgrade then. on 32gb. will 64gb do?
I already changed api timeout to 30 min because I still got "api timeout" error w/ 10min connecting to LM Studio (llm on cpu).
Curious how you changed to Llama tokenizer. Do you mind sharing changes?
Thanks

@phalexo
Copy link

phalexo commented May 6, 2024

Thanks for sharing that. I guess I'd need a memory upgrade then. on 32gb. will 64gb do? I already changed api timeout to 30 min because I still got "api timeout" error w/ 10min connecting to LM Studio (llm on cpu). Curious how you changed to Llama tokenizer. Do you mind sharing changes? Thanks

import re
import requests
import os
import sys
import time
import json
import tiktoken
from prompt_toolkit.styles import Style

from jsonschema import validate, ValidationError
from utils.style import color_red, color_yellow
from typing import List
from const.llm import MAX_GPT_MODEL_TOKENS, API_CONNECT_TIMEOUT, API_READ_TIMEOUT
# alexo: Slow LLM, override
API_READ_TIMEOUT=3000
from const.messages import AFFIRMATIVE_ANSWERS
from logger.logger import logger, logging
from helpers.exceptions import TokenLimitError, ApiKeyNotDefinedError, ApiError
from utils.utils import fix_json, get_prompt
from utils.function_calling import add_function_calls_to_request, FunctionCallSet, FunctionType
from utils.questionary import styled_text

from .telemetry import telemetry

#tokenizer = tiktoken.get_encoding("cl100k_base")

# alexo: Should Llama-3 tokenizer be used?
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-70B", revision="refs/pr/6")

--------------------------------------------------- just a few top lines from the file.
this file is in utils folder, llm_connection.py

@mcchung52
Copy link
Author

Yikes.. getting an error

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.

@phalexo
Copy link

phalexo commented May 6, 2024

Yikes.. getting an error

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.

Probably. It is free though.

@mcchung52
Copy link
Author

Are you storing your token somewhere? how is it pulling?

@phalexo
Copy link

phalexo commented May 6, 2024

Not sure what you mean by storing token. If you navigate to Hugging Face and try to access Llama models it will ask you to go through a quick process. You can pretty much invent the info. At some point in time, I think, I also set something up with ssh keys and HF account.

@mcchung52 mcchung52 changed the title [Enhancement]: Circumvent invalid JSON in local LLM / auto execute option? [Enhancement]: Chunk & save task gen process? Auto execute option? #invalid JSON #local LLM May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants