You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-22Lines changed: 9 additions & 22 deletions
Original file line number
Diff line number
Diff line change
@@ -67,43 +67,30 @@ cd LLM_scoring && bash scoring_api.sh
67
67
---
68
68
69
69
### 🧩 Step 2. Score curation
70
-
Th score curation codebase is from [Docta](https://github.com/Docta-ai/docta) in the `./score_curation` path. You can execute the score curation by running
70
+
One can execute the score curation by running
71
71
```
72
72
cd score_curation && bash diagnose.sh
73
73
```
74
-
The corresponding curation report files could be found in the path `./score_curation/results`.
74
+
The corresponding curation report files can be found in the path `score_curation_results/`.
75
75
76
76
77
77
---
78
78
79
79
### 🧩 Step 3. Data selection
80
-
Given the existing score curation reports, you can directly use the following jupyter notebooks to do data selection including all baselines: `data_generation.ipynb`. The generated subsets can be further used for LLM instruction tuning. Other selected datasets used for ablation study can be also generated from the following jupyter notebooks located in the `./score_curation` path: `data_gen_score_curation.ipynb` and `data_gen_data_scale.ipynb`. In particular, we use `data_gen_score_curation.ipynb` to generate subsets after curating machine-generated raw scores.
81
-
80
+
Given the existing score curation reports, one can directly generate the high-quality subset by
81
+
```
82
+
python subset_generation.py
83
+
```
84
+
The generated subsets can be further used for the following LLM instruction tuning.
82
85
83
86
84
87
---
85
88
### 🧩 Step 4. Finetune & Evaluation
86
-
Given the selected subsets in the path `model_finetune/selected_data/`, you can use the code base from [TULU](https://github.com/allenai/open-instruct) to finetune base models (Mistral or LLaMA) and then do evaluation.
87
-
In particular, you can submit the jobs via launcher under the path `model_finetune/`. For example, you can submit the job by running the code
88
-
```
89
-
cd model_finetune/ && launcher run job_pipeline_all.yaml
90
-
```
91
-
92
-
93
-
Futhermore, we can also execute the code locally, e.g.,
89
+
Given the selected subsets in the path `selected_data/`, one can use the code base from [TULU](https://github.com/allenai/open-instruct) to finetune base models (Mistral or LLaMA) and then do evaluation. Here, for convenience, one can also finetune the model by
94
90
```
95
-
cd model_finetune/ && bash run_pipeline_all.sh
91
+
cd model_finetune/ && bash run_pipeline.sh
96
92
```
97
93
98
-
One can present the final result by running
99
-
```
100
-
python model_finetune/read_results.py
101
-
```
102
-
103
-
------
104
-
105
-
## Final results
106
-
The final results of LLM judging compared with human-annotated dataset LIMA can be found in `lima_compare_plot.ipynb`. Moreover, for the tabular results, you can check the `reading_results.ipynb` jupyter notebook.
0 commit comments