Inquiry about Evaluating Code Assistant Products with CodeFuse-AI Evaluation Benchmarks #7

theshypig · 2024-04-16T09:39:04Z

Hi, CodeFuse-AI team,

I am interested in evaluating several code assistant products. However, I do not possess a large-scale code model of my own. Instead, what I have are the responses these code assistants provide to various prompts.

My question is, would it be possible to evaluate these code assistants by creating a dataset from their responses, even in the absence of a large-scale code model? If this is not feasible, could you kindly suggest any alternative approaches?

In the event that this is possible, are there any additional considerations I should be aware of when creating this dataset, apart from the requirements mentioned in the Readme?

I appreciate your guidance and look forward to your response.

Best regards,
ck.

HotSummer888 · 2024-05-08T09:06:20Z

Hi,ck

you can use the datasets provided in this repository to evaluate code assistant products by comparing their results to understand if there is an improvement in coding capabilities over the models that the assistant products rely on. Additionally, depending on the target application scenarios of the assistant products, you can expand the evaluation datasets and metrics within this framework.

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry about Evaluating Code Assistant Products with CodeFuse-AI Evaluation Benchmarks #7

Inquiry about Evaluating Code Assistant Products with CodeFuse-AI Evaluation Benchmarks #7

theshypig commented Apr 16, 2024

HotSummer888 commented May 8, 2024 •

edited

Inquiry about Evaluating Code Assistant Products with CodeFuse-AI Evaluation Benchmarks #7

Inquiry about Evaluating Code Assistant Products with CodeFuse-AI Evaluation Benchmarks #7

Comments

theshypig commented Apr 16, 2024

HotSummer888 commented May 8, 2024 • edited

HotSummer888 commented May 8, 2024 •

edited