-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare outlines
for outlines-core
v0.2
#1386
base: main
Are you sure you want to change the base?
Conversation
It doesn't yet use the new I'll have to make it look more like this?import json
from outlines_core.json_schema import build_regex_from_schema
from outlines_core.fsm.guide import Guide, Index, Vocabulary
class GuideWrapper:
"""Wrapper for the Guide class to handle different guide types."""
def __init__(self, schema: dict, tokenizer_model: str):
self.regex = build_regex_from_schema(json.dumps(schema))
self.vocabulary = Vocabulary.from_pretrained(tokenizer_model)
self.index = Index(self.regex, self.vocabulary)
self.guide = Guide(self.index)
def get_current_state(self):
"""Get current state of the Guide."""
return self.guide.get_state()
def get_allowed_tokens(self):
"""Get allowed tokens for the current state of the Guide."""
return self.guide.get_tokens()
def advance(self, token_id: int):
"""Advance Guide to the next state via some token_id and return allowed tokens for that new state."""
return self.guide.advance(token_id)
def is_finished(self):
"""Check if Guide is finished."""
return self.guide.is_finished()
def get_eos_token_id(self):
"""Get the EOS token ID from the vocabulary."""
return self.vocabulary.get_eos_token_id()
# Example usage
schema = {
"title": "Foo",
"type": "object",
"properties": {"date": {"type": "string", "format": "date"}}
}
tokenizer_model = "openai-community/gpt2"
guide_wrapper = GuideWrapper(schema, tokenizer_model)
# Get current state of the Guide:
current_state = guide_wrapper.get_current_state()
# Get allowed tokens for the current state of the Guide:
allowed_tokens = guide_wrapper.get_allowed_tokens()
# Advance Guide to the next state via some token_id and return allowed tokens for that new state:
next_allowed_tokens = guide_wrapper.advance(allowed_tokens[-1])
# To check if Guide is finished:
guide_finished = guide_wrapper.is_finished()
# If it's finished then this assertion holds:
assert guide_wrapper.get_allowed_tokens() == [guide_wrapper.get_eos_token_id()] |
Yes, you will need In other words, I would suggest to start with that removed Then we can iterate over what could be missing in the For example, you can start with something very basic like this and then see where it leads you, method by method: class RegexGuide(Guide):
"""Guide to generate text in the language of a regular expression."""
def __init__(self, guide, eos_tensor):
# EOS tensor is used in instructions, so better to be created just once
self.eos_tensor = eos_tensor
self._guide = guide
@classmethod
def from_regex(cls, regex, tokenizer):
# Tokenizer has something like `name_or_path` to get to the model's name
vocabulary = Vocabulary.from_pretrained(tokenizer.name_or_path())
# Unclear yet if we need attr access to Index/Vocabulary, you will see it when it develops
index = Index(regex, vocabulary)
guide = Guide(index)
eos_tensor = torch.tensor([vocabulary.eos_token_id()])
return cls(guide, eos_tensor) |
ae2a8a9
to
4fd3c69
Compare
It was finally published to pypi, please try using |
4fd3c69
to
fbf6ae3
Compare
|
The Json Schema conversion is happening here now. |
0aab081
to
cc33757
Compare
You can also import |
9ccbf4e
to
875b369
Compare
outlines/fsm/guide.py
Outdated
class BetterFSM: | ||
def __init__(self): | ||
self.finals = None | ||
self.flat_transition_map = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we shouldn't need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I remove everything that's fsm
-related?
f71ef14
to
28a3dae
Compare
@@ -623,7 +619,7 @@ def match(self, text, pos, last_fsm_state_seq: Optional[Tuple[int, ...]] = None) | |||
text_part, | |||
) | |||
|
|||
state_seq = walk_fsm( | |||
state_seq = walk_fsm( # type: ignore # noqa: F821 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I should remove the whole PartialScanner
, or workaround that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep everything that is related to grammars for now
@@ -209,7 +209,7 @@ def test_sequential_parse_example(cleanup_lark_import): | |||
# TODO: Remove once fsm_union and walk_fsm are implemented in Outlines-Core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that walk_fsm
isn't yet available in outlines-core
, is that comment still accurate?
28a3dae
to
4747647
Compare
This is a very early half-baked draft aimed at addressing #1380, Currently, I’m struggling to make it work, so far, my development environment is just replacing the Python interfaces in
outlines-core
with those from thefsm
subfolder:I also edited my outlines Conda environment to use the local version of outlines-core:
Overall, I try to update the
outlines.fsm.guide
module to match the logic in the example from the issue. Are we planning to keep supporting all theGuide
variants? For example, IIUC,outlines-core
no longer providesRegexGuide
, right? So I integrated some code from https://github.com/dottxt-ai/outlines-core/blob/0.1.27/python/outlines_core/fsm/guide.py, as per @torymur’s advice.