|
1 |
| -# Linking Sketches and Software Architecture (LiSSA) |
2 |
| - |
3 |
| -The LiSSA approach aims to connect sketches and informal diagrams (such as class diagrams, component diagrams, ...) with |
4 |
| -formal models like component models. |
5 |
| - |
6 |
| -The following diagram shows the pipeline that is planned for the LiSSA approach. |
7 |
| - |
8 |
| -```mermaid |
9 |
| -stateDiagram-v2 |
10 |
| - DiagramDetection |
11 |
| - TextPreprocessing |
12 |
| - ArchitectureModel |
13 |
| - TextExtraction |
14 |
| - EntityRecognition |
15 |
| - RecommendationGeneration |
16 |
| - ConnectionGeneration |
17 |
| - InconsistencyDetection |
18 |
| -
|
19 |
| - DiagramDetection --> RecommendationGeneration |
20 |
| - TextPreprocessing --> TextExtraction |
21 |
| - ArchitectureModel --> RecommendationGeneration |
22 |
| - TextExtraction --> EntityRecognition |
23 |
| - DiagramDetection --> EntityRecognition |
24 |
| - EntityRecognition --> RecommendationGeneration |
25 |
| - RecommendationGeneration --> ConnectionGeneration |
26 |
| - ConnectionGeneration --> InconsistencyDetection |
27 |
| -``` |
| 1 | +# LiSSA: A Framework for Generic Traceability Link Recovery |
| 2 | + |
| 3 | +Welcome to the LiSSA project! |
| 4 | +This framework leverages Large Language Models (LLMs) enhanced through Retrieval-Augmented Generation (RAG) to establish traceability links across various software artifacts. |
| 5 | + |
| 6 | +## Overview |
| 7 | + |
| 8 | +In software development and maintenance, numerous artifacts such as requirements, code, and architecture documentation are produced. |
| 9 | +Understanding the relationships between these artifacts is crucial for tasks like impact analysis, consistency checking, and maintenance. |
| 10 | +LiSSA aims to provide a generic solution for Traceability Link Recovery (TLR) by utilizing LLMs in combination with RAG techniques. |
| 11 | + |
| 12 | +The concept and evaluation of LiSSA are detailed in our paper: |
| 13 | + |
| 14 | +> Fuchß, D., Hey, T., Keim, J., Liu, H., Ewald, N., Thirolf, T., & Koziolek, A. (2025). LiSSA: Toward Generic Traceability Link Recovery through Retrieval-Augmented Generation. In Proceedings of the IEEE/ACM 47th International Conference on Software Engineering, Ottawa, Canada. |
| 15 | +
|
| 16 | +You can access the paper [here](https://ardoco.de/c/icse25). |
| 17 | + |
| 18 | +## Features |
| 19 | + |
| 20 | +- **Generic Applicability**: LiSSA is designed to recover traceability links across various types of software artifacts, including: |
| 21 | + - [Requirements to code](https://ardoco.de/c/icse25) |
| 22 | + - [Documentation to code](https://ardoco.de/c/icse25) |
| 23 | + - [Architecture documentation to architecture models](https://ardoco.de/c/icse25) |
| 24 | + |
| 25 | +- **Retrieval-Augmented Generation**: By combining LLMs with RAG, LiSSA enhances the accuracy and relevance of the recovered traceability links. |
| 26 | + |
| 27 | +## Getting Started |
| 28 | + |
| 29 | +To get started with LiSSA, follow these steps: |
| 30 | + |
| 31 | +1. **Clone the Repository**: |
| 32 | + ```bash |
| 33 | + git clone https://github.com/ArDoCo/LiSSA-RATLR |
| 34 | + cd LiSSA-RATLR |
| 35 | + ``` |
| 36 | + |
| 37 | +2. **Install Dependencies**: |
| 38 | + Ensure you have Java JDK 21 or later installed. Then, build the project using Maven: |
| 39 | + ```bash |
| 40 | + mvn clean package |
| 41 | + ``` |
| 42 | + |
| 43 | +3. **Run LiSSA**: |
| 44 | + Execute the main application: |
| 45 | + ```bash |
| 46 | + java -jar target/ratlr-*-jar-with-dependencies.jar eval -c config.json |
| 47 | + ``` |
| 48 | + |
| 49 | +### Configuration |
| 50 | + |
| 51 | +1. Create a configuration you want to use for evaluation / execution. E.g., you can find configurations [here](https://github.com/ArDoCo/ReplicationPackage-ICSE25_LiSSA-Toward-Generic-Traceability-Link-Recovery-through-RAG/tree/main/LiSSA-RATLR-V2/lissa/configs/req2code-significance). You can also provide a directory containing multiple configurations. |
| 52 | +2. Configure your OpenAI API key and organization in a `.env` file. You can use the provided template file as a template `env-template`. |
| 53 | +3. LiSSA caches requests in order to be reproducible. The cache is located in the cache folder that can be specified in the configuration. |
| 54 | +4. Run `java -jar target/ratlr-*-jar-with-dependencies.jar eval -c configs/....` to run the evaluation. You can provide a JSON or a directory containing JSON configurations. |
| 55 | +5. The results will be printed to the console and saved to a file in the current directory. The name is also printed to the console. |
| 56 | + |
| 57 | +### Results of Evaluation / Execution |
| 58 | +The results will be stored as markdown files. |
| 59 | +A result file can look like below. |
| 60 | +It contains the configuration and the results of the evaluation. |
| 61 | +Additionally, the LiSSA generate CSV files that contain the traceability links as pairs of identifiers. |
| 62 | + |
| 63 | +<details> |
| 64 | +<summary>Example Result</summary> |
| 65 | + |
| 66 | +```json |
| 67 | +## Configuration |
| 68 | +{ |
| 69 | + "cache_dir" : "./cache-r2c/dronology-dd--102959883", |
| 70 | + "gold_standard_configuration" : { |
| 71 | + "hasHeader" : false, |
| 72 | + "path" : "./datasets/req2code/dronology-dd/answer.csv" |
| 73 | + }, |
| 74 | + "... other configuration parameters ..." |
| 75 | +} |
| 76 | + |
| 77 | +## Stats |
| 78 | +* # TraceLinks (GS): 740 |
| 79 | +* # Source Artifacts: 211 |
| 80 | +* # Target Artifacts: 423 |
| 81 | +## Results |
| 82 | +* True Positives: 283 |
| 83 | +* False Positives: 1286 |
| 84 | +* False Negatives: 457 |
| 85 | +* Precision: 0.18036966220522627 |
| 86 | +* Recall: 0.3824324324324324 |
| 87 | +* F1: 0.24512776093546992 |
| 88 | +``` |
| 89 | + |
| 90 | +</details> |
| 91 | + |
| 92 | +## Evaluation |
| 93 | + |
| 94 | +LiSSA has been empirically evaluated on three different TLR tasks: |
| 95 | + |
| 96 | +- Requirements to code |
| 97 | +- Documentation to code |
| 98 | +- Architecture documentation to architecture models |
| 99 | +- Requirements to requirements |
| 100 | + |
| 101 | +The results indicate that the RAG-based approach can significantly outperform state-of-the-art methods in code-related tasks. |
| 102 | +However, further research is needed to enhance its performance for broader applicability. |
| 103 | + |
| 104 | +## Acknowledgments |
| 105 | + |
| 106 | +LiSSA is developed by researchers from the Modelling for Continuous Software Engineering (MCSE) group of KASTEL - Institute of Information Security and Dependability at the Karlsruhe Institute of Technology (KIT). |
| 107 | + |
| 108 | +For more information about the project and related research, visit our [website](https://ardoco.de/). |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +*Note: This README provides a brief overview of the LiSSA project. For comprehensive details, please refer to the [repository](https://github.com/ArDoCo/LiSSA-RATLR)* |
0 commit comments