Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
data		data
eval		eval
eval_for_system_prompt_modulation		eval_for_system_prompt_modulation
metadata		metadata
LICENSE		LICENSE
concept.png		concept.png
readme.md		readme.md
requirements.txt		requirements.txt

Repository files navigation

DailyDilemmas

Link: Paper - ICLR 2025 (Spotlight) | HuggingFace Data

Dataset: data/dilemmas_with_detail_by_action.csv

Croissant metadata: metadata/crossiant

Three use cases -- 1. evaluate model preferences on everyday dilemmas 2. evaluate model preference on dilemmas based on the given principle 3. evaluate the model's steerability by system prompt modulation on each given principle

We used gpt-4-turbo as demonstrated below. We also provided our model responses on our dilemmas dataset run by five models, and some corresponding analysis datasets for replication purposes to avoid running them again through the flag (replication_purpose).

Setup the environment

$ conda create --name <env> --file requirements.txt

Case 1: Evaluate model preferences on everyday dilemmas

1. Evaluate the gpt-4-turbo model on our dilemma dataset.

python eval/evaluate_model_on_dilemma.py \
--output_jsonl_file_div <output_jsonl_file_div> \ 
--model <gpt-4-turbo> \
--model_name_for_variable <own_created_name_e.g.gpt4> \
--replication_purpose <full/only_first_five> \
--api_key <optional_you_can_set_as_.env_or_global>

Required Arguments with suggestions:

output_jsonl_file_div: eval/model_response_from_eval.
model name: gpt-4-turbo.
replicaiton_purpose: [full, only_first_five]: [full dilemmas dataset, only first five dilemmas]

Optional Arguments:

api_key: you can input OpenAI API key as an argument. Use a .env file or set as a global variable

2. Combine model response from 1 to get the details of the dilemma for each action.

python eval/combine_model_eval_with_dilemmas_action_separate.py \
--input_model_resp_file <step_1_output_jsonl_file> \
--input_dilemma_file <MoralDilemma_dataset> \
--output_analysis_file <output_analysis_file_div> \
--model <model_name_for_variable>

Required Arguments with suggestions:

input_model_resp_file: eval/model_response_from_eval
input_dilemma_file: data/dilemmas_with_detail_by_action.csv or data/dilemmas_with_detail_by_action_test.csv if use only first five data in step 1
output_analysis_file: eval/model_response_from_eval. The data of <model_name>_eval_on_dilemmas_with_action_separate.csv will be created.

3. Analyze the combined model responses file with dilemmas details from step 2 based on the five theories.

python analysis/analysis_for_model_responses_five_theories/analyze_model_resp_with_five_theories.py \
--input_model_resp_file <output_from_step_2> \
--output_analysis_file <e.g.analysis/analysis_for_model_responses_five_theories/model_resp.csv> \
--models <model_names>

Required Arguments with suggestions:

input_model_resp_file: use output file from step 2. e.g., eval/model_response_from_eval/<model_name>_eval_on_dilemmas_with_action_separate.csv. We also provided the six model responses in eval/model_response_from_eval/all_models_eval_on_dilemmas_with_detail_by_action.csv
models: gpt4; For six models, gpt4 gpt35 llama2 llama3 mixtral_rerun claude

4. Plot the graphs from the analyzed files with five theories dimensions.

python analysis/analysis_for_model_responses_five_theories/create_plot.py \
--input_analysis_file <output_path_in_step_3/model_resp.csv> \
--output_graph_div <path_for_graphs>

Required Arguments with suggestions:

input_analysis_file: Output from step 3 e.g., analysis/analysis_for_model_responses_five_theories/

Case 2: Evaluate model preferences on everyday dilemmas based on the given principle

1. Get the relevant values for each principle from collected values by prompting the model 10 times.

python eval_for_system_prompt_modulation/get_values_for_principle.py \
--input_principle_file <data/principle_company.csv> \
--output_jsonl_file_div <e.g._eval_for_system_prompt_modulation/values_and_system_prompt_for_principle> \
--model <openai_model_e.g._gpt-4-0125-preview> \
--replication_purpose <full_/or/_only_first_two_principles>\
--api_key <optional_you_can_set_as_.env_or_global>

Required Arguments with suggestions:

input_principle_file: data/principle_openai.csv
model: gpt-4-0125-preview
replicaiton_purpose: [full, only_first_two_principles]

Optional Arguments:

api_key: you can input OpenAI API key as an argument. Use a .env file or set as a global variable

2. Calculate the value's relevance by the empirical probabilities from the 10 model responses in step 1.

python eval_for_system_prompt_modulation/calc_value_relevance.py \
--input_system_prompt_file <output_from_step_1> \
--output_jsonl_file_div <e.g._eval_for_system_prompt_modulation/values_and_system_prompt_for_principle>

Required Arguments with suggestions:

input_system_prompt_file: Output from step 1 e.g., eval_for_system_prompt_modulation/values_and_system_prompt_for_principle/principle_openai.csv

3. Get the values conflict and relevant dilemmas for each principle.

python eval_for_system_prompt_modulation/get_values_conflicts_and_relevant_dilemmas_for_principle.py \
--input_principle_file <output_from_step_2_e.g._principle_company_clean.csv> \
--input_dilemma_file <output_from_case_1_step_2> \
--output_jsonl_file_div <eval_for_system_prompt_modulation/values_and_system_prompt_for_principle> \
--model <model_name> \

Required Arguments with suggestions:

input_principle_file: Output from step 2. e.g., eval_for_system_prompt_modulation/values_and_system_prompt_for_principle/<model_name>_clean.csv
input_dilemma_file: Output from Step 2 of Case 1. We also provided the six models responses: eval/model_response_from_eval/all_models_eval_on_dilemmas_with_detail_by_action.csv

Case 3: Evaluate the model's steerability by system prompt modulation on each given principle

1. Generate system prompts for each principle -- one for steering to supporting values and one for steering to opposing values for each principle

python eval_for_system_prompt_modulation generate_system_prompts_for_principle.py \ 
--input_system_prompt_file <output_from_step_3_in_case_2> \
--output_jsonl_file_div <e.g._eval_for_system_prompt_modulation/values_and_system_prompt_for_principle> \
--model gpt-4-turbo \
--api_key <optional_you_can_set_as_.env_or_global>

Required Arguments with suggestions:

input_system_prompt_file: Output from step 3 of case 2: e.g.,<output_path>/principle_gpt4_eval_with_values_conflicts_and_dilemma.csv

Optional Arguments:

api_key: you can input OpenAI API key as an argument. Use a .env file or set as a global variable

2. Evaluate models on relevant dilemmas for each principle by each set of system prompts.

python eval_for_system_prompt_modulation/evaluate_model_on_system_prompt.py \
--input_system_prompt_file <output_from_step_1> \
--input_dilemma_file <data/dilemmas_with_detail_by_action.csv> \
--output_jsonl_file_div <e.g._eval_for_system_prompt_modulation/model_response_on_system_prompt/> \
--model gpt-4-turbo \
--api_key <optional_you_can_set_as_.env_or_global>

Required Arguments with suggestions:

input_system_prompt_file: Output from step 1: e.g., <output_path>/<model>_with_system_prompts.csv
input_dilemma_file: MoralDilemmas data: ```data/dilemmas_with_detail_by_action.csv``
model: gpt-4-turbo

Optional Arguments:

api_key: you can input OpenAI API key as an argument. Use a .env file or set as a global variable

3. Analyze the model responses from step 2 to evaluate the steerability of the model.

python analysis/analysis_for_system_prompt_modulation/analyze_model_resp_on_system_prompt.py \
--input_principle_file <output_from_step_2> \
--input_dilemma_file <output_from_case_1_step_2_or_provided_file> \
--output_div <e.g.analysis/analysis_for_system_prompt_modulation> \
--steer <sup/opp> \
--company_name <use_for_create_output_file_name_e.g._openai> \
--model <the_name_used_for_field_and_also_for_output_file_name>

Required Arguments with suggestions:

input_principle_file: Output from step 2. e.g., eval_for_system_prompt_modulation/model_response_on_system_prompt/{model}_eval.csv
input_dilemma_file: Output from Step 1 in Case 1: eval/model_response_from_eval/<model_name>_eval_on_dilemmas_with_detail_by_action.csv or our provided six model responses: eval/model_response_from_eval/all_models_eval_on_dilemmas_with_detail_by_action.csv
steer: sup means that it analyzes the model responses when having system prompts steering to supporting values.

4. Create a plot from the analysis files from step 3.

python analysis/analysis_for_system_prompt_modulation/create_plot.py \ 
--input_support_value_file <path/...system_prompt_sup_value.csv> \ 
--input_oppose_value_file <path/...system_prompt_sup_value.csv> \ 
--output_graph_div <analysis/analysis_for_system_prompt_modulation/> \
--replication_purpose <only_two_principles/or/full>

Required Arguments with suggestions:

input_[support/oppose]_value_file: Output from step 3 with steer == [sup/opp]: principle_<companyname>_<modelname>_eval_with_system_prompt_system_prompt_[sup/opp]_value.csv
replication_purpose: [only_two_principles/full]. Depends on the evaluation of Step 1 in case 2

About

No description, website, or topics provided.

CC-BY-4.0 license

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%