Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make IQ1_M work for QK_K = 64 #6327

Merged
merged 3 commits into from Mar 27, 2024
Merged

Make IQ1_M work for QK_K = 64 #6327

merged 3 commits into from Mar 27, 2024

Conversation

ikawrakow
Copy link
Contributor

As with all other i-quants, AVX2, ARM_NEON, CPU scalar, Metal. CUDA will come later.

@ikawrakow ikawrakow merged commit cbc8343 into master Mar 27, 2024
57 of 58 checks passed
@ikawrakow ikawrakow deleted the ik/iq1m_64 branch March 27, 2024 07:44
@ikawrakow
Copy link
Contributor Author

@ggerganov Perhaps you should disable the nix build? I don't know about you, but for me a check running for 6 hours and eventually cancelled on every commit does not make much sense. If nothing else, lets have some merci with our planet.

@ggerganov
Copy link
Owner

@SomeoneSerge Is there something to be done to speed-up the builds? AFAICT, with the recent workflow concurrency changes (#6243) all Nix builds are bound to be cancelled since the chance of committing something to master within 6h is quite large and this would cancel all running workflows

@SomeoneSerge
Copy link
Collaborator

@ggerganov thanks for the heads-up; I noticed a few cancelled builds but haven't got around to investigate this. I opened a tracking issue for now: #6346

@mscheong01
Copy link
Collaborator

mscheong01 commented Mar 28, 2024

If we don't want nix builds to fail on master, we could exempt the master branch as stated in the #6243 description

But, if we don't want this to happen with our master branch workflows, we can make an exception. Here's how we could set it up:

concurrency: 
  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event.inputs.sha }}
  cancel-in-progress: true

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* iq1_m: make it work for QK_K = 64 (WIP)

* iq1_m: make it work for QK_K = 64 (scalar and AVX2)

* iq1_m: QK_K = 64 seems to work on Metal and ARM_NEON

---------

Co-authored-by: Iwan Kawrakow <[email protected]>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
* iq1_m: make it work for QK_K = 64 (WIP)

* iq1_m: make it work for QK_K = 64 (scalar and AVX2)

* iq1_m: QK_K = 64 seems to work on Metal and ARM_NEON

---------

Co-authored-by: Iwan Kawrakow <[email protected]>
tybalex pushed a commit to tybalex/function.cpp that referenced this pull request Apr 17, 2024
* iq1_m: make it work for QK_K = 64 (WIP)

* iq1_m: make it work for QK_K = 64 (scalar and AVX2)

* iq1_m: QK_K = 64 seems to work on Metal and ARM_NEON

---------

Co-authored-by: Iwan Kawrakow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants