This repository is a test suite for approximate code search including AI-generated code search.
- Homepage: https://github.com/aboutcode-org/matchcode-tests/
- Related repos:
Clone this repository
In the clone, run
make dev
run
. venv/bin/activate
run the full test suite with:
pytest -vvs tests
This is designed to run only on Linux.
test_matchcode.py
uses the dataset "Analyzing the Dependability of Large
Language Models for Code Clone Generation"
(https://zenodo.org/records/11398703). This dataset contains code solutions to
problems from LeetCode that have been generated by AI from an original solution.
The tests in test_matchcode.py
compare the original solutions to the
different variations of AI generated solutions, where we compare Hamming
distances and detected ngrams from the different solutions.
- the data is under a CC-BY-4.0 license
- the code is under the Apache-2.0 license