Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mlx_lm): basic speculative decoding support in mlx_lm.generate / mlx_lm.server #954

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Commits on Aug 21, 2024

  1. feat: support batch input in generate()

    The `prompt` argument can now be either a `str` or `list[str]`.
    
    The change to `generate()` is backwards-compatible.
    
    The changes to `generate_step()`, `top_p_sampling()`, and
    `min_p_sampling()` are backwards-incompatible in order to unify shapes;
    this could be changed by adding a few if-statements, if preferred.
    llllvvuu committed Aug 21, 2024
    Configuration menu
    Copy the full SHA
    ef92993 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2024

  1. Configuration menu
    Copy the full SHA
    5105b31 View commit details
    Browse the repository at this point in the history

Commits on Aug 27, 2024

  1. feat: basic speculative decoding support in mlx_lm.generate / mlx_lm.…

    …server
    
    This basic version only supports bs=1, temp=0, max_kv_size=None.
    Supporting samplers, rotating cache, and batching are deferred to future
    commits in order to keep this diff small.
    llllvvuu committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    7d0e1cc View commit details
    Browse the repository at this point in the history