Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requesting support for IBM's OpenSource Granite models #441

Open
q5sys opened this issue May 9, 2024 · 6 comments
Open

Requesting support for IBM's OpenSource Granite models #441

q5sys opened this issue May 9, 2024 · 6 comments
Labels
currently fixing Am fixing now!

Comments

@q5sys
Copy link

q5sys commented May 9, 2024

These open source models were just released yesterday at Red Hat Summit.
https://huggingface.co/ibm-granite
https://arxiv.org/abs/2405.04324

If this ends up being a bigger ask than I think it is, and there's something I can do to help in making this happen, let me know.

@danielhanchen
Copy link
Contributor

Oh interesting!

@danielhanchen danielhanchen added the currently fixing Am fixing now! label May 9, 2024
@junzzhu
Copy link

junzzhu commented May 26, 2024

Fine tuning for both ibm-granite/granite-3b-code-instruct and ibm-granite/granite-8b-code-base is working now as far as I checked with Llama3 Colab notebook, with training loss decreasing as expected. However, inference outputs are both useless still.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the fibonnaci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
1#<fim_prefix>A
# str
 growth
 for
 for
 for
 for





  `



  `
 `
 
 
           
 ` ` ` ` ` ` ` ` ` `                                                           9\ `<fim_prefix><fim_prefix><fim_prefix><fim_prefix>

@q5sys
Copy link
Author

q5sys commented May 28, 2024

I noticed the other day when I was attempting to quantize the 34B larger models that the Granite models are 2 different types. The 3B,7B, and 8B models are llama, while the 20B and 34B are gpt-bigcode models. Not sure how that would or wouldn't affect fine tuning since i haven't looked into it yet, but I figured it was worth mentioning.

@danielhanchen
Copy link
Contributor

@q5sys So if its other model types, then well error out for now.

@junzzhu Oh wait its a Code model, so finetuning on text might not work as expected. Hence the weird output

@junzzhu
Copy link

junzzhu commented May 29, 2024

Oh wait its a Code model, so finetuning on text might not work as expected. Hence the weird output

Cool! That helps. With 7b-base model, the output is meaningful now. Thanks @danielhanchen

@danielhanchen
Copy link
Contributor

Great it worked!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
currently fixing Am fixing now!
Projects
None yet
Development

No branches or pull requests

3 participants