Skip to content

Latest commit

 

History

History
190 lines (163 loc) · 20.4 KB

README.md

File metadata and controls

190 lines (163 loc) · 20.4 KB

Awesome-LLM-SoftwareTesting

A collection of papers and resources about the utilization of large language models (LLMs) in software testing.

Software testing is a critical task that is essential for ensuring the quality and reliability of software products. As software systems become increasingly complex, new and more effective software testing techniques are needed. Recently, large language models (LLMs) have emerged as a breakthrough technology in natural language processing and artificial intelligence. These models are capable of performing various coding-related tasks, including code generation and code recommendation. Therefore, the use of LLMs in software testing is expected to yield significant improvements. On one hand, software testing involves tasks such as unit test generation that require code understanding and generation. On the other hand, LLMs can generate diverse test inputs to ensure comprehensive coverage of the software being tested. In this repository, we present a comprehensive review of the utilization of LLMs in software testing. We have collected 102 relevant papers and conducted a thorough analysis from both software testing and LLMs perspectives, as summarized in Figure 1.

Figure 1. Structure of the contents in this paper

We hope this repository can help researchers and practitioners to get a better understanding of this emerging field. If this repository is helpful for you, please help us by citing this paper:

@article{Wang2023SoftwareTW,
  title={Software Testing with Large Language Model: Survey, Landscape, and Vision},
  author={Junjie Wang and Yuchao Huang and Chunyang Chen and Zhe Liu and Song Wang and Qing Wang},
  journal={ArXiv},
  year={2023},
  volume={abs/2307.07221},
  url={https://api.semanticscholar.org/CorpusID:259924919}
}

Table of Contents📇

News🎉

This project is under development. You can hit the STAR and WATCH to follow the updates.

Overview🔭

From software testing perspective

Figure 2. Distribution of testing tasks with LLMs

We find that LLMs have proven to be efficient in the mid to late stages of the software testing lifecycle. During the mid-phase of software testing, LLMs have been successfully applied for various test case preparation tasks, including the generation of unit test cases, test oracle generation, and system test input generation. In later phases, such as the bug fix phase and the preparation of test reports/bug reports, LLMs have been utilized for tasks like bug analysis, debugging, and repair.

From LLM perspective

In our collected studies, the LLM most frequently employed is ChatGPT, widely recognized and popular for its exceptional performance across various tasks. The second most commonly used LLM is Codex, trained on an extensive code corpus, aiding researchers in coding-related tasks. Ranked third is CodeT5, an open-source LLM capable of conducting pre-training and fine-tuning with domain-specific data, thereby achieving better performance.

Figure 4. Distribution about how LLM is used (prompt engineering)

In our collected studies, 38 studies utilize the LLMs through pre-training or fine-tuning schema, while 64 studies employ the prompt engineering to communicate with LLMs to steer its behavior for desired outcomes without updating the model weights. Among them, 51 studies involve zero-shot learning, and 25 studies involve few-shot learning. There are also studies involving the chain-of-thought (7 studies), self-consistency (1 study), and automatic prompt (1 study).

Figure 5. Distribution about other techniques incorporated with LLMs

In our collected studies, 67 of them utilize LLMs to address the entire testing task, while 35 studies incorporate additional techniques. These techniques include mutation testing, differential testing, syntactic checking, program analysis, statistical analysis, etc. .

Related Surveys🗎

Unit test case generation

Test oracle generation

System test input generation

Bug analysis

Debug

Program repair