👩‍⚖ Code-Eval 👨‍⚖️

This repository is designed to evaluate the executability of LLM-generated code. It is largely based on, modified, and extended from OpenAI's HumanEval.

🛠️ How to Use

1. ✅ Check Your Environment

Before running the code, you need to set up the environment using conda. Follow these steps to set up the environment:

Clone the repository and set up the environment:
```
git clone [email protected]:Leolty/code-eval.git && cd code-eval
conda create --name codeeval python=3.10 && conda activate codeeval
pip install -r requirements.txt
```
🔍 Note: The requirements.txt may not include all necessary packages. Use pip install <package_name> to install any missing dependencies as needed.
Verify your environment for Java, C++, and Python:
```
# 🐍 Python Test
python ./test/test.py

# ☕ Java Test
java -ea ./test/Test.java

# 💻 C++ Test
g++ -o ./test/test ./test/test.cpp && ./test/test && rm ./test/test
```
If all tests output All tests passed! 🎉, your environment should be ready. If not, please troubleshoot any issues about the environment setup before proceeding.

2. 🧪 Execute Code

To check whether a generated code snippet is correct (more precisely, executable), you can use the check_correctness function. Here’s a simple example for Python code:

from human_eval.execution import check_correctness

python_code = """
def add(a, b):
    return a + b

print(add(1, 2))
"""

res = check_correctness(
    sample={"test_code": python_code},
    language="python",
)

print(res)

This will output:

{
  "passed": true,
  "result": "passed",
  "completion_id": null
}

ℹ️ Explanation of `check_correctness`:

The check_correctness function evaluates the correctness of code based on the following parameters:

sample: A dictionary containing the test code (under the key test_code) that you want to evaluate, and other optional relevant information (e.g., task_id).
language: The programming language of the code you are testing. Supported languages include "python", "java", and "cpp" now.
timeout: The maximum allowed time (in seconds) for the execution. By default, this is set to 5 seconds.
completion_id: (Optional) A unique identifier used for matching test results if needed.

The function executes the code in a sandboxed environment 🛡️ and returns whether the code passed, failed, or timed out ⏳.

⚠️ Disclaimer

📝 Authorship

This repository is based on OpenAI’s HumanEval, with minor modifications.

🔒 Safety

The code provided here is for evaluation purposes only. Do not execute untrusted or potentially unsafe code in your local environment. This evaluation tool is designed to run model-generated code, which may cause unintended side effects. Users are strongly encouraged to sandbox the evaluation to prevent any destructive actions on their host systems or networks. Please ensure appropriate precautions are taken before running any code.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
human_eval		human_eval
leetcode		leetcode
test		test
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test_humaneval.py		test_humaneval.py
test_leetcode.py		test_leetcode.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

👩‍⚖ Code-Eval 👨‍⚖️

🛠️ How to Use

1. ✅ Check Your Environment

2. 🧪 Execute Code

ℹ️ Explanation of `check_correctness`:

⚠️ Disclaimer

📝 Authorship

🔒 Safety

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Leolty/code-eval

Folders and files

Latest commit

History

Repository files navigation

👩‍⚖ Code-Eval 👨‍⚖️

🛠️ How to Use

1. ✅ Check Your Environment

2. 🧪 Execute Code

ℹ️ Explanation of check_correctness:

⚠️ Disclaimer

📝 Authorship

🔒 Safety

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

ℹ️ Explanation of `check_correctness`:

Packages