Skip to content

Latest commit

 

History

History
16 lines (9 loc) · 2.21 KB

README_en.md

File metadata and controls

16 lines (9 loc) · 2.21 KB

Thus-Spake-Long-Context-LLM

中文 参考介绍。

This is a survey of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation. We upload it in Github for public review, since it remains on hold status in arXiv.

Long context is an important topic in Natural Language Processing (NLP), running through the development of NLP architectures, and offers immense opportunities for Large Language Models (LLMs) giving LLMs the lifelong learning potential akin to humans. Unfortunately, the pursuit of a long context is accompanied by numerous obstacles. Nevertheless, long context remains a core competitive advantage for LLMs. In the past two years, the context length of LLMs has achieved a breakthrough extension to millions of tokens. Moreover, the research on long-context LLMs has expanded from length extrapolation to a comprehensive focus on architecture, infrastructure, training, and evaluation technologies.

Inspired by the symphonic poem, Thus Spake Zarathustra, we draw an analogy between the journey of extending the context of LLM and the attempts of humans to transcend its mortality. In this survey, we give a global picture of the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation, including length extrapolation, cache optimization, memory management, architecture innovation, training infrastructure, inference infrastructure, long-context pre-training, long-context post-training, long-context MLLM (mainly long VideoLLM), and long-context evaluation, showcasing the full spectrum of long-context technologies. At the end of this survey, we present 10 unanswered questions currently faced by long-context LLMs.

The paper structure is shown as follows. We hope this survey can serve as a systematic introduction to the research on long-context LLMs. Due to the authors’ limited knowledge, this survey may contain omissions or mistakes. We welcome constructive comments from readers. We will carefully consider these suggestions and release a revised version in two to three months.