Potential unexpected behavior in executor (that uses the`global`) #40

allanj · 2024-04-07T07:32:19Z

I was running the script https://github.com/noahshinn/reflexion/blob/main/programming_runs/run_reflexion_codellama_multi.sh
with CodeLLaMA model, simply change the codellama to codellama-7b

CUDA_VISIBLE_DEVICES=$1 python main.py \
  --run_name "reflexion_codellama_$1" \
  --root_dir "root" \
  --dataset_path ./benchmarks/humaneval-py.jsonl \
  --strategy "reflexion" \
  --language "py" \
  --model "codellama-7b" \
  --pass_at_k "1" \
  --max_iters "2" \
  --verbose | tee ./logs/reflexion_codellama_$1

The performance is around 30~40% though it will get killed in the middle of the experiment.

Observation

Some predictions present unexpected behavior while it is correct prediction is_solved = True. In the following prediction (which is also attached here)

The first program generated actually failed the generated test cases (NOTE: these test cases actually generated by CodeLLaMA-7B), although this program passes the ground-truth test cases (eventually).
After reflection, the model actually generate nothing (i.e., None).
So, suppose we should take "" as our final output, right?
But the "is_solved" is actually assigned to True.

Reason

Because we always use the function globals(), even though the program after reflexion has nothing, globals() already contain the program in the first attempt.

Thus, as we do not clear the information in globals() every time, potentially this could lead to some unexpected behavior.

I have been a bit confused here, hope the authors can clarify more on this.

The text was updated successfully, but these errors were encountered:

allanj · 2024-04-07T07:36:03Z

Especially, globals() will definitely memorize previous implementations, which will be used if the future generation is None.

Or in future generation, it generate something (e.g., some functions) not exists but those functions are memorized in globals(). Potentially, that has some contamination in the data.

That means, the order to run the evaluation samples matters

reyrey-app · 2024-04-12T07:30:17Z

So if I understand correctly, it could potentially cause incorrect answers to be correct?

allanj · 2024-04-12T08:21:02Z

Yes, right!.

But I assume for more capable models like GPT-3.5/4 or other large models, such an issue might rarely happen.

I just got in touch with the authors, and currently making a fix.

allanj · 2024-04-12T09:35:47Z

I think the current issue is that, we need to deal with the situation
if cur_func_impl is None, what we are going to do,

We always put assertion for the current implementation

assert isinstance(cur_func_impl, str)

because it is possible that "cur_func_impl" is pure natural language response without any code extracted, which make the variable cur_func_impl to be None.

My current attempt:

For the first attempt, we just set to "" empty string, as it is also initial implementation

if cur_func_impl is None:
    # code cannot be extracted from the response.
    cur_func_impl = ""

For the reflection attempt, we can roll back and reflect again, rather than put an empty string.

if reflected_func_impl is None:
    # code cannot be extracted from the response.
    # we just skip this iteration and reflection again.
    cur_iter += 1
    continue

I'm making a PR on this, and will update more results here

Some changes: main...allanj:reflexion:fix_global
I'm running some experiments to make sure the results

allanj · 2024-04-13T05:17:04Z

If I clear the globals() variable information, the performance seems to drop a lot (if using CodeLLaMA-7B), I'm checking the reason

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential unexpected behavior in executor (that uses the`global`) #40

Potential unexpected behavior in executor (that uses the`global`) #40

allanj commented Apr 7, 2024 •

edited

Loading

allanj commented Apr 7, 2024 •

edited

Loading

reyrey-app commented Apr 12, 2024

allanj commented Apr 12, 2024

allanj commented Apr 12, 2024 •

edited

Loading

allanj commented Apr 13, 2024

Potential unexpected behavior in executor (that uses theglobal) #40

Potential unexpected behavior in executor (that uses theglobal) #40

Comments

allanj commented Apr 7, 2024 • edited Loading

Observation

Reason

allanj commented Apr 7, 2024 • edited Loading

reyrey-app commented Apr 12, 2024

allanj commented Apr 12, 2024

allanj commented Apr 12, 2024 • edited Loading

allanj commented Apr 13, 2024

Potential unexpected behavior in executor (that uses the`global`) #40

Potential unexpected behavior in executor (that uses the`global`) #40

allanj commented Apr 7, 2024 •

edited

Loading

allanj commented Apr 7, 2024 •

edited

Loading

allanj commented Apr 12, 2024 •

edited

Loading