Adding MedQA and general QandA finetuning stuff #9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Working on the Privacy issue #8 , need to start by adding a reasonable QandA from medical domain. Added MedQA here: https://huggingface.co/datasets/lurosenb/medqa .
Added an evaluate_qanda.py script with some QandA specific functions. Piggybacked off of @BeanHam 's finetune_summarization file, as it was my starting point and there was lots of overlap in the implementation. I vote we rename "finetune_summarization" to just "finetune_runner" and use it to run any task (with proper command line customization, as demonstrated.
Still need to improve on the QandA metrics (for MedQA, multiple choice means we should have an accuracy score with better answer parsing, not just the SQUAD style f1 score. I added some work to that end but its incomplete).
Also, didn't run tests beyond finetuning Llama. trying not to get too distracted by experiments, as my goal is to move on quickly to the privacy finetuning task which is non trivial.
Also added a readme which catalogued the process of getting my task going. Hopefully it's useful for someone!