【English | 中文】
Large Language Models (LLMs) have rapidly evolved into powerful general-purpose reasoning and generation engines. Nevertheless, despite their continuously advancing capabilities, LLMs remain fundamentally constrained by a critical limitation: the finite length of their context window. This constraint defines the scope of information directly accessible during a single inference process, endowing models with only short-term memory capabilities. Consequently, they struggle to support extended conversations, personalized interactions, continuous learning, and complex multi-stage tasks.
To transcend the inherent limitations of context windows, AI memory and memory systems for LLMs have emerged as a vital and active research and engineering frontier. By introducing external, persistent, and controllable memory structures beyond model parameters, these systems enable large models to store, retrieve, compress, and manage historical information during generation processes. This capability allows models to continuously leverage long-term experiences within limited context windows, achieving cross-session consistency and continuous reasoning abilities.
Awesome-AI-Memory is a comprehensive repository dedicated to AI memory and memory systems for large language models, systematically curating relevant research papers, framework tools, and practical implementations. This repository endeavors to map the rapidly evolving research landscape in LLM memory systems, bridging multiple disciplines including natural language processing, information retrieval, intelligent agent systems, and cognitive science.
Our mission is to establish a centralized, continuously evolving knowledge base that serves as a valuable reference for researchers and practitioners, ultimately accelerating the development of intelligent systems capable of long-term memory retention, sustained reasoning, and adaptive evolution over time.
This repository focuses on memory mechanisms and system designs that extend or augment the context window capabilities of large language models, rather than merely addressing model pre-training or general knowledge learning. The content encompasses both theoretical research and engineering practices.
🌀 Included Content (In Scope)
- Memory and memory system designs for large language models
- External explicit memory beyond model parameters
- Short-term memory, long-term memory, episodic memory, and semantic memory
- Retrieval-Augmented Generation (RAG) as a memory access mechanism
- Memory management strategies (writing, updating, forgetting, compression)
- Memory systems in intelligent agents (Agents)
- Shared and collaborative memory in multi-agent systems
- Memory models inspired by cognitive science and biological memory
- Evaluation methods, benchmarks, and datasets related to LLM memory
- Open-source frameworks and tools for memory-enhanced LLMs
🌀 Excluded Content (Out of Scope)
- General model pre-training or scaling research without direct memory relevance
- Purely parameterized knowledge learning without memory interaction
- Traditional databases or information retrieval systems unrelated to LLMs
- Generic memory systems outside the LLM context (unless demonstrating direct transfer value)
- 2026-02-14 - 🎉 Updated 15 papers, including 1 on survey, 12 on methods, 1 on benchmarks, and 1 on systems and models
- 2026-02-09 - 🎉 Updated 15 papers
- 2026-02-01 - 🎉 Updated 16 papers, including 9 on methods, 4 on benchmarks, and 3 on systems and models
- 2025-12-24 – 🎉 Release Repository V(1.0)
- 2025-12-10 – 🎉 Initial Repo
🗺️ Table of Contents
- Introduction
- Goal of Repository
- Project Scope
- Recent hot research and news
- Core Concepts
- Paper List
- Resource
- Make a Contribution
- Star Trends
-
LLM Memory: A fusion of implicit knowledge encoded within parameters (acquired during training) and explicit storage outside parameters (retrieved at runtime), enabling models to transcend token limitations and possess human-like abilities to "remember the past, understand the present, and predict the future."
-
Memory System: The complete technical stack implementing memory functionality for large language models, comprising four core components:
- Memory Storage Layer: Vector databases (e.g., Chroma, Weaviate), graph databases, or hybrid storage solutions
- Memory Processing Layer: Embedding models, summarization generators, and memory segmenters
- Memory Retrieval Layer: Multi-stage retrievers, reranking modules, and context injectors
- Memory Control Layer: Memory prioritization managers, forgetting controllers, and consistency coordinators
-
Memory Operations: Atomic memory operations executed through tool calling in memory systems:
- Writing: Converting dialogue content into vectors for storage, often combined with summarization to reduce noise
- Retrieval: Generating queries based on current context to obtain Top-K relevant memories
- Updating: Finding relevant memories via vector similarity and replacing or enhancing them
- Deletion: Removing specific memories based on user instructions or automatic policies (e.g., privacy expiration)
- Compression: Merging multiple related memories into summaries to free storage space
-
Memory Management: The methodology for managing memories within memory systems, including:
- Memory Lifecycle: End-to-end management from creation, active usage, infrequent access, to archiving/deletion
- Conflict Resolution: Arbitration mechanisms for contradictory information (e.g., timestamp priority, source credibility weighting)
- Resource Budgeting: Allocating memory quotas to different users/tasks to prevent resource abuse
- Security Governance: Automatic detection and de-identification of PII (Personally Identifiable Information)
-
Memory Classification: A multi-dimensional classification system unique to memory systems:
- By Access Frequency: Working memory (current tasks), frequent memory (personal preferences), archived memory (historical records)
- By Structured Degree: Structured memory (database records), semi-structured memory (dialogue summaries), unstructured memory (raw conversations)
- By Sharing Scope: Personal memory (single user), team memory (collaborative spaces), public memory (shared knowledge bases)
- By Temporal Validity: Permanent memory (core facts), temporary memory (conversation context), time-sensitive memory (e.g., "user is in a bad mood today")
-
Memory Mechanisms: Core technical components enabling memory system functionality:
- Retrieval-Augmented Generation (RAG): Enhancing generation by retrieving relevant information from knowledge bases
- Memory Reflection Loop: Models periodically "review" conversation history to generate high-level summaries
- Memory Routing: Automatically selecting retrieval sources based on query type (personal memory/public knowledge base)
-
Explicit Memory: Memory stored as raw text outside the model, implemented through vector databases with hybrid indexing strategies:
- Dense Vector Indexing: Handling semantic similarity queries
- Sparse Keyword Indexing: Processing exact match queries
- Multi-vector Indexing: Segmenting long documents into multiple parts, each independently indexed
-
Parametric Memory: Knowledge and capabilities stored within the fixed weights of a language model's architecture, characterized by:
- Serving as the model's core long-term semantic memory carrier
- Being activatable without external retrieval or explicit contextual support
- Providing the foundational capability for zero-shot reasoning, general responses, and language generation
-
Long-Term Memory: Key information designed for persistent storage, typically implemented as external knowledge bases with capabilities including:
- Automatic Summarization: Distilling multi-turn dialogues into structured memory
- Context Binding: Recording memory context to prevent erroneous generalization
- Multimodal Storage: Simultaneously preserving text, images, audio, and other multimodal memories
-
Short-Term Memory: Active information within the LLM's context window, constrained by attention mechanisms. Key techniques include:
- KV Cache Management: Reusing key-value caches to reduce redundant computation
- Context Compression: Using summaries instead of detailed history (e.g., "the previous 5 dialogue rounds discussed project budget")
- Sliding Window Attention: Focusing only on the most recent N tokens while preserving special markers
- Memory Summary Injection: Dynamically inserting summaries of long-term memory into short-term context
-
Episodic Memory: Memory type recording specific user interaction history, fundamental to personalized AI:
- User Identity Recognition: Identifying the same user across sessions
- Interaction Trajectory Recording: Preserving user decision paths and feedback
- Emotional State Tracking: Recording patterns of user mood changes
- Preference Evolution Modeling: Capturing long-term changes in user interests
-
Memory Forgetting: Deliberately designed forgetting mechanisms in large models, including:
- Selective Forgetting (Machine Unlearning): Removing the influence of specific information from training data, such as covering specific knowledge with forgetting layers
- Privacy-Driven Forgetting: Automatically identifying and deleting PII information, or setting automatic expiration
- Memory Decay: Automatically lowering the priority of infrequently accessed memories based on usage frequency
- Conflict-Driven Forgetting: Strategically updating or discarding old memories when new evidence conflicts with them
-
Memory Retrieval: The complex process of precisely locating relevant information from massive memory repositories:
- Semantic Pre-filtering: Vector similarity matching to obtain Top-100 candidates
- Contextual Reranking: Reordering results based on current query context
- Temporal Filtering: Prioritizing the most recent relevant information
-
Memory Compression: A collection of techniques maximizing memory utility under limited resources:
- Content-level Compression: Extracting core information while discarding redundant details
- Representation-level Compression: Vector quantization (e.g., PQ coding), dimensionality reduction
- Organization-level Compression: Clustering similar memories, building hierarchical memory structures
- Knowledge Distillation: Transferring key patterns from external memory into parametric memory
Papers below are ordered by publication date:
Survey
Framework & Methods
Datasets & Benchmark
Systems & Models
| Task Type | Benchmarks & Datasets |
|---|---|
| Personalized Task Evaluation | IMPLEXCONV, PERSONAMEM, PERSONAMEM-v2, PersonaBench, PersonaFeedback, LaMP, MemDaily, MPR, KnowMe-Bench |
| Comprehensive Evaluation | MemoryAgentBench, LifelongAgentBench, StreamBench |
| Memory Mechanism Evaluation | MemBench, Minerva, MemoryBench |
| Long-Term Memory Evaluation | LOCCO, LONGMEMEVAL, LOCOMO, MADial-Bench, StoryBench, DialSim, Mem-Gallery, RealMem, CloneMem |
| Long-Dialogue Reasoning | PREFEVAL, MiniLongBench |
| Long-Context Understanding | LongBench V2, LongBench, BABILong, HotpotQA |
| Long-Context Evaluation | SCBENCH, L-CiteEval, GLE, HOMER, RULER, MM-Needle |
| Long-Form Text Generation | LongGenBench |
| Episodic Memory Evaluation | PerLTQA |
| Memory Hallucination Evaluation | HaluMem |
| Web Interaction & Navigation | WebChoreArena, MT-Mind2Web, WebShop, WebArena |
Systems below are ordered by publication date:
| Type | Website Link | Video Introduction |
|---|---|---|
| Basic Theory of Memory | https://www.youtube.com/watch?v=k3FUWWEwgfc | Short-Term Memory with LangGraph |
| https://www.youtube.com/watch?v=WsGVXiWzTpI | OpenAI: Agent Memory Patterns | |
| https://www.youtube.com/watch?v=fsENEq4F55Q | Long-Term Memory with LangGraph | |
| https://www.youtube.com/watch?v=L-au0tvDJbI | LLMs Do Not Have Human-Like Working Memories | |
| https://www.youtube.com/watch?v=RkWor1BZOn0 | long-term memory and personalization for LLM applications | |
| https://www.youtube.com/watch?v=CFih0_6tn2w | A Paradigm Shift to Memory as a First Class Citizen for LLMs | |
| Memory-Related Tools | https://www.bilibili.com/video/BV1hom8YAEhX | LLMs as Operating Systems: Agent Memory |
| https://www.bilibili.com/video/BV1CU421o7DL | Langchain Agent with memory | |
| https://www.bilibili.com/video/BV1arJazVEaX | Open Memory MCP | |
| https://www.bilibili.com/video/BV11HxXzuExk | Agentic Memory for LLM Agents | |
| Memory-Related Papers | https://www.bilibili.com/video/BV1XT8ez6E46 | AI agent Survey Memory |
| https://www.bilibili.com/video/BV1f12wBpEXX | MemGen: Weaving Generative Latent Memory for Self-Evolving Agents | |
| https://www.bilibili.com/video/BV1deyFBKEFh | MLP Memory: A Retriever-Pretrained Memory for Large Language Models | |
| https://www.bilibili.com/video/BV18FnVzpE6S | How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior | |
| https://www.bilibili.com/video/BV1mpbrzSEH9 | Agent Workflow Memory | |
| https://www.bilibili.com/video/BV1qEtozyEoh | Introduction to the Memory Mechanism of Large Language Model Agents | |
| https://www.bilibili.com/video/BV1FGrhYhEZK | Memory Layers at Scale | |
| https://www.bilibili.com/video/BV1aQ1xBkE45 | Agentic Memory for LLM Agents | |
| https://www.bilibili.com/video/BV1Yz421f7uH | Evaluating Very Long-Term Conversational Memory of LLM Agents | |
| https://www.bilibili.com/video/BV19RWdzxEsR | Lightweight plug-in memory system |
Issue Template:
Title: [paper's title]
Head: [head name1] (, [head name2] ...)
Published: [arXiv / ACL / ICLR / NIPS / ...]
Summary:
- Innovation:
- Tasks:
- Significant Result:
Join our community to ask questions, share your projects, and connect with other developers.
- GitHub Issues: Report bugs or request features in our GitHub Issues.
- GitHub Pull Requests: Contribute code improvements via Pull Requests.
- GitHub Discussions: Participate in our GitHub Discussions to ask questions or share ideas.
- WeChat: Scan the QR code below to join our discussion group, get the latest research information related to Memory, or promote your related research results.
|

