Thus-Spake-Long-Context-LLM

中文参考介绍。

This is a survey of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation. We upload it in Github for public review, since it remains on hold status in arXiv.

Long context is an important topic in Natural Language Processing (NLP), running through the development of NLP architectures, and offers immense opportunities for Large Language Models (LLMs) giving LLMs the lifelong learning potential akin to humans. Unfortunately, the pursuit of a long context is accompanied by numerous obstacles. Nevertheless, long context remains a core competitive advantage for LLMs. In the past two years, the context length of LLMs has achieved a breakthrough extension to millions of tokens. Moreover, the research on long-context LLMs has expanded from length extrapolation to a comprehensive focus on architecture, infrastructure, training, and evaluation technologies.

Inspired by the symphonic poem, Thus Spake Zarathustra, we draw an analogy between the journey of extending the context of LLM and the attempts of humans to transcend its mortality. In this survey, we give a global picture of the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation, including length extrapolation, cache optimization, memory management, architecture innovation, training infrastructure, inference infrastructure, long-context pre-training, long-context post-training, long-context MLLM (mainly long VideoLLM), and long-context evaluation, showcasing the full spectrum of long-context technologies. At the end of this survey, we present 10 unanswered questions currently faced by long-context LLMs.

The paper structure is shown as follows. We hope this survey can serve as a systematic introduction to the research on long-context LLMs. Due to the authors’ limited knowledge, this survey may contain omissions or mistakes. We welcome constructive comments from readers. We will carefully consider these suggestions and release a revised version in two to three months.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_en.md

README_en.md

Thus-Spake-Long-Context-LLM

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

Thus-Spake-Long-Context-LLM