Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.1 RoPE #14

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

danielhanchen
Copy link

Beware - my C is very rusty (haven't done C in like ages lol) - I might have transcribed it incorrectly from https://github.com/unslothai/unsloth/blob/main/unsloth/models/llama.py#L1116

From https://news.ycombinator.com/item?id=41053201
Llama 3.1 uses a new RoPE scaling mechanism for 128K context extension using:

# From https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/api/model.py#L41
def apply_scaling(self, freqs: torch.Tensor):
    # Values obtained from grid search
    scale_factor = 8
    low_freq_factor = 1
    high_freq_factor = 4
    old_context_len = 8192  # original llama3 length

    low_freq_wavelen = old_context_len / low_freq_factor
    high_freq_wavelen = old_context_len / high_freq_factor
    new_freqs = []
    for freq in freqs:
        wavelen = 2 * math.pi / freq
        if wavelen < high_freq_wavelen:
            new_freqs.append(freq)
        elif wavelen > low_freq_wavelen:
            new_freqs.append(freq / scale_factor)
        else:
            assert low_freq_wavelen != high_freq_wavelen
            smooth = (old_context_len / wavelen - low_freq_factor) / (
                high_freq_factor - low_freq_factor
            )
            new_freqs.append((1 - smooth) * freq / scale_factor + smooth * freq)
    return torch.tensor(new_freqs, dtype=freqs.dtype, device=freqs.device)

Did not add a flag to enable Llama 3.1 scaling though

Beware - my C is very rusty (haven't done C in like ages lol) - I might have transcribed it incorrectly from https://github.com/unslothai/unsloth/blob/main/unsloth/models/llama.py#L1116
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant