Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying Medicine: Developing an Accessible Health Information Platform #134

Closed
1 task done
Kshah002 opened this issue May 24, 2024 · 10 comments
Closed
1 task done
Assignees

Comments

@Kshah002
Copy link

Is your feature request related to a problem? Please describe.

Not everyone is familiar with all diseases and the medical terminology used by professionals. Therefore, an interface should be created to address this issue.

Describe the solution you'd like

To develop an LLM model trained on a dataset encompassing various medical terms and diseases, the fundamental concept is for users to inquire about specific diseases and receive relevant, accurate responses. I aim to leverage a pre-trained model, such as LLaMA 2, using Hugging Face.

Describe alternatives you've considered

No response

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
Copy link

Congratulations, @Kshah002! 🎉 Thank you for creating your issue. Your contribution is greatly appreciated and we look forward to working with you to resolve the issue. Keep up the great work!

We will promptly review your changes and offer feedback. Keep up the excellent work! Kindly remember to check our contributing guidelines

@Kshah002
Copy link
Author

Kshah002 commented May 24, 2024

dataset that can be used - https://huggingface.co/datasets/gamino/wiki_medical_terms

@Kshah002
Copy link
Author

Kshah002 commented May 24, 2024

hi @SrijanShovit I have raised the issue and have worked with similar project using LLama. So I would request to add the required labels and assign the task to me

@SrijanShovit
Copy link
Owner

Hmm....that looks cool. What are your detailed steps?

@Kshah002
Copy link
Author

Kshah002 commented May 25, 2024

I wanted to try thus have already started working on the project so here is a general idea of how i am to finetune the llama2 model -

First step was to get dataset according to llama2 format however i got the formatted dataset from huggingface itself - https://huggingface.co/datasets/aboonaji/wiki_medical_terms_llam2_format
Using 4-bit quantization while loading the pre-trained LLAma2 model - https://huggingface.co/aboonaji/llama2finetune-v2
To use 4-bit weights, with float16 for computation, and specifying the quantization type as nf4
Loading a tokenizer compatible with the LLaMA model, setting the pad token
Will be using peft for fine tuning

I am using google colab so, the parameters are considered keeping that in mind. I hope you get the gist of what I am trying to do

@Kshah002
Copy link
Author

@SrijanShovit Hii there. Any updates ??

@SrijanShovit
Copy link
Owner

Yes looks good. Do make documentation along with your minor code steps and keep committing in a single PR.

Copy link

This issue has been automatically closed because it has been inactive for more than 7 days. If you believe this is still relevant, feel free to reopen it or create a new one. Thank you!

@Kshah002
Copy link
Author

Kshah002 commented Jun 12, 2024

Hey @SrijanShovit can you help me reopen the issue ? I am done with my task and was about to upload and saw the issue is closed. Can you help me out here ? Sorry for being a bit late though.

Or should I raise a new issue ?

@Kshah002
Copy link
Author

Hi @SrijanShovit any updates ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants