English | Simplified Chinese
ROGRAG enhances LLM performance on specialized topics using a robust GraphRAG approach. It features a two-stage (dual-level and logic form methods) retrieval mechanism to improve accuracy without extra computation costs. ROGRAG achieves a 15% score boost on SeedBench, outperforming mainstream methods.
Key Highlights:
- Two-stage retrieval for robustness
- Incremental database construction
- Enhanced fuzzy matching and structured reasoning
Method | QA-1 (Accuracy) | QA-2 (F1) | QA-3 (Rouge) | QA-4 (Rouge) |
---|---|---|---|---|
vanilla (w/o RAG) | 0.57 | 0.71 | 0.16 | 0.35 |
LangChain | 0.68 | 0.68 | 0.15 | 0.04 |
BM25 | 0.65 | 0.69 | 0.23 | 0.03 |
RQ-RAG | 0.59 | 0.62 | 0.17 | 0.33 |
ROGRAG (Ours) | 0.75 | 0.79 | 0.36 | 0.38 |
Deployed on an online research platform, ROGRAG is ready for integration. Here is the technical report.
If it is useful to you, please star it ⭐
- 1. Run from Docker (CMD / Swagger Server API / Gradio)
- 2. Run from Source
- 3. Directory Structure and Function
- FAQ about environment and error
Compared to HuixiangDou, this repo improves accuracy:
-
Graph Schema. Dense retrieval is only for querying similar entities and relationships.
-
Ported/merged multiple open-source implementations, with code differences of nearly 18k lines:
- Data. Organized a set of real domain knowledge that LLM has not fully seen for testing (gpt accuracy < 0.6)
- Ablation. Confirmed the impact of different stages and parameters on accuracy
-
API remains compatible. That means Wechat/Lark/Web in v1 is also accessible.
# v1 API https://github.com/InternLM/HuixiangDou/blob/main/huixiangdou/service/parallel_pipeline.py#L290 async def generate(self, query: Union[Query, str], history: List[Tuple[str]]=[], language: str='zh', enable_web_search: bool=True, enable_code_search: bool=True): # v2 API https://github.com/tpoisonooo/HuixiangDou2/blob/main/huixiangdou/pipeline/parallel.py#L135 async def generate(self, query: Union[Query, str], history: List[Pair] = [], request_id: str = 'default', language: str = 'zh_cn'):
- SiliconCloud Abundant LLM API, some models are free
- KAG Graph retrieval based on reasoning
- DB-GPT LLM tool collection
- LightRAG Simple and efficient graph retrieval solution
- SeedBench A multi-task benchmark for evaluating LLMs in seed science
!!! The impact of open-source on different fields/industries varies. Since licensing restriction, we can only give the code and test conclusions, and the test data cannot be provided.
@misc{kong2024huixiangdou,
title={HuiXiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
year={2024},
eprint={2401.08772},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{kong2024labelingsupervisedfinetuningdata,
title={Labeling supervised fine-tuning data with the scaling law},
author={Huanjun Kong},
year={2024},
eprint={2405.02817},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2405.02817},
}
@misc{kong2025huixiangdou2robustlyoptimizedgraphrag,
title={HuixiangDou2: A Robustly Optimized GraphRAG Approach},
author={Huanjun Kong and Zhefan Wang and Chenyang Wang and Zhe Ma and Nanqing Dong},
year={2025},
eprint={2503.06474},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2503.06474},
}