Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare outlines for outlines-core v0.2 #1386

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yvan-sraka
Copy link
Contributor

This is a very early half-baked draft aimed at addressing #1380, Currently, I’m struggling to make it work, so far, my development environment is just replacing the Python interfaces in outlines-core with those from the fsm subfolder:

outlines-core on git main [xr!+?] is pkg v0.0.0 via py v3.12.8 via rs v1.84.0  
[OCI] > git status  
On branch main  
Your branch is up to date with 'outline-core/main'.  
Changes to be committed:  
  (use "git restore --staged <file>..." to unstage)  
        modified:   python/outlines_core/__init__.py  
        deleted:    python/outlines_core/fsm/__init__.py  
        renamed:    python/outlines_core/fsm/json_schema.py -> python/outlines_core/json_schema.py  
        renamed:    python/outlines_core/fsm/outlines_core_rs.pyi -> python/outlines_core/outlines_core_rs.pyi

I also edited my outlines Conda environment to use the local version of outlines-core:

diff --git a/environment.yml b/environment.yml  
index 6fc980f..6553e74 100644  
--- a/environment.yml  
+++ b/environment.yml  
@@ -20,4 +20,4 @@ dependencies:  
   - transformers  
   - pip  
   - pip:  
-    - -e ".[test]"  
+    - -e ".[test]" -e ../outlines_core  
diff --git a/pyproject.toml b/pyproject.toml  
index 28060f7..42da6b0 100644  
--- a/pyproject.toml  
+++ b/pyproject.toml  
@@ -40,7 +40,7 @@ dependencies = [  
    "pycountry",  
    "airportsdata",  
    "torch",  
-   "outlines_core==0.1.26",  
+   "outlines_core",  
 ]  
 dynamic = ["version"]

Overall, I try to update the outlines.fsm.guide module to match the logic in the example from the issue. Are we planning to keep supporting all the Guide variants? For example, IIUC, outlines-core no longer provides RegexGuide, right? So I integrated some code from https://github.com/dottxt-ai/outlines-core/blob/0.1.27/python/outlines_core/fsm/guide.py, as per @torymur’s advice.

@yvan-sraka yvan-sraka requested a review from torymur January 22, 2025 10:38
@yvan-sraka yvan-sraka self-assigned this Jan 22, 2025
@yvan-sraka yvan-sraka linked an issue Jan 22, 2025 that may be closed by this pull request
@yvan-sraka
Copy link
Contributor Author

yvan-sraka commented Jan 22, 2025

It doesn't yet use the new Index and Vocabulary interfaces.

I'll have to make it look more like this?

import json
from outlines_core.json_schema import build_regex_from_schema
from outlines_core.fsm.guide import Guide, Index, Vocabulary

class GuideWrapper:
    """Wrapper for the Guide class to handle different guide types."""

    def __init__(self, schema: dict, tokenizer_model: str):
        self.regex = build_regex_from_schema(json.dumps(schema))
        self.vocabulary = Vocabulary.from_pretrained(tokenizer_model)
        self.index = Index(self.regex, self.vocabulary)
        self.guide = Guide(self.index)

    def get_current_state(self):
        """Get current state of the Guide."""
        return self.guide.get_state()

    def get_allowed_tokens(self):
        """Get allowed tokens for the current state of the Guide."""
        return self.guide.get_tokens()

    def advance(self, token_id: int):
        """Advance Guide to the next state via some token_id and return allowed tokens for that new state."""
        return self.guide.advance(token_id)

    def is_finished(self):
        """Check if Guide is finished."""
        return self.guide.is_finished()

    def get_eos_token_id(self):
        """Get the EOS token ID from the vocabulary."""
        return self.vocabulary.get_eos_token_id()

# Example usage
schema = {
    "title": "Foo",
    "type": "object",
    "properties": {"date": {"type": "string", "format": "date"}}
}
tokenizer_model = "openai-community/gpt2"
guide_wrapper = GuideWrapper(schema, tokenizer_model)

# Get current state of the Guide:
current_state = guide_wrapper.get_current_state()

# Get allowed tokens for the current state of the Guide:
allowed_tokens = guide_wrapper.get_allowed_tokens()

# Advance Guide to the next state via some token_id and return allowed tokens for that new state:
next_allowed_tokens = guide_wrapper.advance(allowed_tokens[-1])

# To check if Guide is finished:
guide_finished = guide_wrapper.is_finished()

# If it's finished then this assertion holds:
assert guide_wrapper.get_allowed_tokens() == [guide_wrapper.get_eos_token_id()]

outlines/fsm/guide.py Outdated Show resolved Hide resolved
outlines/fsm/guide.py Outdated Show resolved Hide resolved
outlines/fsm/guide.py Outdated Show resolved Hide resolved
outlines/fsm/guide.py Outdated Show resolved Hide resolved
@torymur
Copy link
Contributor

torymur commented Jan 23, 2025

It doesn't yet use the new Index and Vocabulary interfaces. I'll have to make it look more like this?

Yes, you will need Index and Vocabulary interfaces. But GuideWrapper in your example doesn't look very helpful, it just unnecessarily repeats the existing logic. RegexGuide is already a wrapper of the Guide just for outlines context, it should adhere to defined protocols of RegexGuide for now and that's it.

In other words, I would suggest to start with that removed RegexGuide from outlines-core and method by method try to fulfill all the requirements outlines asks from it with new interfaces:
https://github.com/dottxt-ai/outlines-core/blob/0.1.27/python/outlines_core/fsm/guide.py

Then we can iterate over what could be missing in the outlines_core interfaces if we see some struggling in the RegexGuide's logic. Or not, if it's just a backward compatibility thing, for example, then it should be in outlines only. We might generally improve on guides interfaces in outlines, but I suspect, that our hands are already full for now.

For example, you can start with something very basic like this and then see where it leads you, method by method:

class RegexGuide(Guide):
    """Guide to generate text in the language of a regular expression."""

    def __init__(self, guide, eos_tensor):
        # EOS tensor is used in instructions, so better to be created just once
        self.eos_tensor = eos_tensor
        self._guide = guide

    @classmethod
    def from_regex(cls, regex, tokenizer):
        # Tokenizer has something like `name_or_path` to get to the model's name
        vocabulary = Vocabulary.from_pretrained(tokenizer.name_or_path())
        # Unclear yet if we need attr access to Index/Vocabulary, you will see it when it develops
        index = Index(regex, vocabulary)
        guide = Guide(index)

        eos_tensor = torch.tensor([vocabulary.eos_token_id()])
        return cls(guide, eos_tensor)

@yvan-sraka yvan-sraka force-pushed the 1380-prepare-outlines-to-use-outlines-core-v02 branch from ae2a8a9 to 4fd3c69 Compare January 27, 2025 08:35
@yvan-sraka yvan-sraka requested a review from torymur January 27, 2025 08:38
@torymur
Copy link
Contributor

torymur commented Jan 31, 2025

It was finally published to pypi, please try using 0.2.2 version of outlines-core.

@yvan-sraka yvan-sraka force-pushed the 1380-prepare-outlines-to-use-outlines-core-v02 branch from 4fd3c69 to fbf6ae3 Compare February 3, 2025 08:59
@yvan-sraka
Copy link
Contributor Author

==================================== ERRORS ====================================
_________________ ERROR collecting tests/fsm/test_cfg_guide.py _________________
ImportError while importing test module '/home/runner/work/outlines/outlines/tests/fsm/test_cfg_guide.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/fsm/test_cfg_guide.py:7: in <module>
    from outlines import grammars, models
outlines/__init__.py:3: in <module>
    import outlines.generate
outlines/generate/__init__.py:3: in <module>
    from .choice import choice
outlines/generate/choice.py:7: in <module>
    from outlines_core.fsm.json_schema import build_regex_from_schema
E   ModuleNotFoundError: No module named 'outlines_core.fsm'
=========================== short test summary info ============================
ERROR tests/fsm/test_cfg_guide.py
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 2.98s ===============================

@rlouf
Copy link
Member

rlouf commented Feb 3, 2025

The Json Schema conversion is happening here now.

@yvan-sraka yvan-sraka force-pushed the 1380-prepare-outlines-to-use-outlines-core-v02 branch 2 times, most recently from 0aab081 to cc33757 Compare February 3, 2025 10:46
@rlouf
Copy link
Member

rlouf commented Feb 3, 2025

You can also import Guide as from outlines_core import Guide

Comment on lines 312 to 315
class BetterFSM:
def __init__(self):
self.finals = None
self.flat_transition_map = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we shouldn't need this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove everything that's fsm-related?

@yvan-sraka yvan-sraka force-pushed the 1380-prepare-outlines-to-use-outlines-core-v02 branch 2 times, most recently from f71ef14 to 28a3dae Compare February 5, 2025 09:58
@@ -623,7 +619,7 @@ def match(self, text, pos, last_fsm_state_seq: Optional[Tuple[int, ...]] = None)
text_part,
)

state_seq = walk_fsm(
state_seq = walk_fsm( # type: ignore # noqa: F821
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I should remove the whole PartialScanner, or workaround that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep everything that is related to grammars for now

@@ -209,7 +209,7 @@ def test_sequential_parse_example(cleanup_lark_import):
# TODO: Remove once fsm_union and walk_fsm are implemented in Outlines-Core
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that walk_fsm isn't yet available in outlines-core, is that comment still accurate?

@yvan-sraka yvan-sraka force-pushed the 1380-prepare-outlines-to-use-outlines-core-v02 branch from 28a3dae to 4747647 Compare February 5, 2025 10:03
@yvan-sraka yvan-sraka requested a review from rlouf February 5, 2025 10:03
@yvan-sraka yvan-sraka linked an issue Feb 6, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants