MIT and NUS Researchers Introduce MEM1: A Memory-Efficient Framework for Long-Horizon Language Agents

Modern language agents need to handle multi-turn conversations, retrieving and updating information as tasks evolve. However, most current systems simply add all past interactions to the prompt, regardless of relevance. This leads to bloated memory usage, slower performance, and poor reasoning on longer inputs that weren’t seen during training. Real-world examples, such as research or […] The post MIT and NUS Researchers Introduce MEM1: A Memory-Efficient Framework for Long-Horizon Language Agents appeared first on MarkTechPost.

Jun 26, 2025 - 15:40

MIT and NUS Researchers Introduce MEM1: A Memory-Efficient Framework for Long-Horizon Language Agents

Modern language agents need to handle multi-turn conversations, retrieving and updating information as tasks evolve. However, most current systems simply add all past interactions to the prompt, regardless of relevance. This leads to bloated memory usage, slower performance, and poor reasoning on longer inputs that weren’t seen during training. Real-world examples, such as research or shopping assistants, show how follow-up questions depend on the previous context. Yet, constant growth prompts strain on system resources and attention. While some solutions use external memory modules, they’re hard to integrate. This raises an important question: can language models learn to manage their memory intelligently as part of reasoning?

Limitations of Context-Growing Prompts and Challenges in Memory Integration

LLM agents have grown from handling simple queries to navigating complex, multi-step tasks like web browsing and research. Frameworks like ReAct, which blend reasoning and action, have helped enable these abilities. Training methods typically rely on behavior cloning or reinforcement learning to shape agent behavior. However, managing memory during multi-turn interactions remains a challenge. The common approach, adding all past context to each prompt, leads to bloated and inefficient memory usage. While external tools like retrievers or summarizers help, they’re often separate from the agent’s reasoning, making integration complex.

Introducing MEM1: A Reinforcement Learning Framework for Constant Memory Language Agents

Researchers from MIT, NUS, SMART, and Yonsei University developed MEM1, a reinforcement learning framework that enables language agents to handle complex, multi-turn tasks while maintaining constant memory usage. Instead of storing full interaction histories, MEM1 updates a compact internal state at each step, merging new information with memory and discarding unnecessary details. This unified reasoning and memory approach enhances efficiency and performance without requiring additional modules. MEM1 was tested across various tasks, including web QA and online shopping, demonstrating up to 3.5 times better performance and 3.7 times less memory usage than larger models, while also generalizing well to longer, unseen task sequences.

Combining Memory Pruning and Iterative Reasoning for Human-Like Problem Solving

MEM1 is designed to tackle complex reasoning tasks by combining memory management with iterative thinking. At each step, the agent processes new information and integrates it with prior knowledge to form a consolidated internal state, then prunes previous context to maintain memory efficiency. This structured memory updating mirrors how humans solve puzzles by focusing on key information while discarding the rest. The team uses reinforcement learning to train the agent to retain only relevant data and applies a masking strategy during optimization to ensure accurate policy updates. To better test long-term reasoning, they also create multi-objective QA tasks from existing datasets.

Benchmarking MEM1 on Long-Horizon QA and Navigation Tasks

The study assesses the MEM1 agent’s capacity to handle complex, multi-turn tasks while maintaining nearly constant memory usage. Trained using reinforcement learning on the Qwen2.5-7B base model, MEM1 is tested in question answering with retrieval-augmented generation and web navigation environments. It is compared against several baselines using both accuracy and efficiency metrics. Results show that MEM1 outperforms others in long-horizon tasks, maintaining strong performance even as task complexity increases. It uses fewer tokens, responds faster, and scales more efficiently. Despite being smaller, MEM1 even surpasses larger models like Qwen2.5-14B-Instruct and GPT-4o in demanding scenarios.

Conclusion and Future Directions for Reinforcement-Learned Memory Consolidation in LLMs

In conclusion, MEM1 is a reinforcement learning framework designed to help language agents handle long, multi-step tasks more efficiently. Unlike traditional methods that store all past information, leading to memory bloat and slower performance, MEM1 maintains a compact internal state by merging new inputs with memory and discarding unnecessary data. It performs well in tasks like question answering and web navigation, while using less memory and computing power. However, MEM1 assumes clear, reliable reward signals, which many real-world tasks lack. Future work aims to adapt MEM1 for open-ended tasks with uncertain or delayed rewards, thereby expanding its applications to broader, more practical scenarios.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post MIT and NUS Researchers Introduce MEM1: A Memory-Efficient Framework for Long-Horizon Language Agents appeared first on MarkTechPost.