Skip to content

tpoisonooo/ROGRAG

Repository files navigation

English | Simplified Chinese

🔥 Introduction

ROGRAG enhances LLM performance on specialized topics using a robust GraphRAG approach. It features a two-stage (dual-level and logic form methods) retrieval mechanism to improve accuracy without extra computation costs. ROGRAG achieves a 15% score boost on SeedBench, outperforming mainstream methods.

Key Highlights:

  • Two-stage retrieval for robustness
  • Incremental database construction
  • Enhanced fuzzy matching and structured reasoning
Method QA-1 (Accuracy) QA-2 (F1) QA-3 (Rouge) QA-4 (Rouge)
vanilla (w/o RAG) 0.57 0.71 0.16 0.35
LangChain 0.68 0.68 0.15 0.04
BM25 0.65 0.69 0.23 0.03
RQ-RAG 0.59 0.62 0.17 0.33
ROGRAG (Ours) 0.75 0.79 0.36 0.38

Deployed on an online research platform, ROGRAG is ready for integration. Here is the technical report.

If it is useful to you, please star it ⭐

📖 Documentation

🔆 Version Description

Compared to HuixiangDou, this repo improves accuracy:

  1. Graph Schema. Dense retrieval is only for querying similar entities and relationships.

  2. Ported/merged multiple open-source implementations, with code differences of nearly 18k lines:

    • Data. Organized a set of real domain knowledge that LLM has not fully seen for testing (gpt accuracy < 0.6)
    • Ablation. Confirmed the impact of different stages and parameters on accuracy
  3. API remains compatible. That means Wechat/Lark/Web in v1 is also accessible.

    # v1 API https://github.com/InternLM/HuixiangDou/blob/main/huixiangdou/service/parallel_pipeline.py#L290
    async def generate(self,
                query: Union[Query, str],
                history: List[Tuple[str]]=[], 
                language: str='zh', 
                enable_web_search: bool=True,
                enable_code_search: bool=True):
    
    # v2 API https://github.com/tpoisonooo/HuixiangDou2/blob/main/huixiangdou/pipeline/parallel.py#L135
    async def generate(self,
                    query: Union[Query, str],
                    history: List[Pair] = [],
                    request_id: str = 'default',
                    language: str = 'zh_cn'):
    

🍀 Acknowledgements

  • SiliconCloud Abundant LLM API, some models are free
  • KAG Graph retrieval based on reasoning
  • DB-GPT LLM tool collection
  • LightRAG Simple and efficient graph retrieval solution
  • SeedBench A multi-task benchmark for evaluating LLMs in seed science

📝 Citation

!!! The impact of open-source on different fields/industries varies. Since licensing restriction, we can only give the code and test conclusions, and the test data cannot be provided.

@misc{kong2024huixiangdou,
      title={HuiXiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
      author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
      year={2024},
      eprint={2401.08772},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kong2024labelingsupervisedfinetuningdata,
      title={Labeling supervised fine-tuning data with the scaling law}, 
      author={Huanjun Kong},
      year={2024},
      eprint={2405.02817},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2405.02817}, 
}

@misc{kong2025huixiangdou2robustlyoptimizedgraphrag,
      title={HuixiangDou2: A Robustly Optimized GraphRAG Approach}, 
      author={Huanjun Kong and Zhefan Wang and Chenyang Wang and Zhe Ma and Nanqing Dong},
      year={2025},
      eprint={2503.06474},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2503.06474}, 
}