Agentic Routing
New memories are first filtered by vector distance, then an SLM router selects the best semantic cluster using cluster profiles. If similarity is too low, CLAG creates a new cluster.
Findings of ACL 2026
1Korea University · 2Myongji University · 3AIGEN Sciences
CLAG is a memory framework for small language model agents. It replaces a single global memory pool with agent-driven semantic clusters, localized memory evolution, and cluster-aware retrieval so limited-capacity agents receive less distractor context.
Problem
Existing agent memory systems usually store past observations, actions, tool outputs, and feedback in one global retrieval pool. As the buffer grows, retrieval becomes more likely to surface semantically plausible but task-irrelevant memories, while memory evolution can update records using topic-mixed neighborhoods.
This is especially costly for small language models, which are more sensitive to irrelevant context. CLAG keeps the benefits of self-evolving memory, but constrains updates and retrieval to semantically coherent neighborhoods.
Method
New memories are first filtered by vector distance, then an SLM router selects the best semantic cluster using cluster profiles. If similarity is too low, CLAG creates a new cluster.
Linking, rewriting, and consolidation are performed only inside the routed cluster. This preserves topic consistency and avoids global pairwise comparisons.
CLAG first selects relevant clusters from profile summaries and tags, then retrieves fine-grained evidence only from those clusters to suppress distractors.
Results
On Qwen3-0.6B, CLAG reports the best F1 and BLEU-1 across LoCoMo, HotpotQA, and BioASQ. The largest gain appears on BioASQ, where dense biomedical terminology makes global similarity search prone to distractors.
Final-answer performance on Qwen3-0.6B.
| Model | Method | LoCoMo | HotpotQA | BioASQ | |||
|---|---|---|---|---|---|---|---|
| F1 | BLEU-1 | F1 | BLEU-1 | F1 | BLEU-1 | ||
| Qwen3-0.6B | RAG | 12.9 | 10.39 | 11.75 | 11.17 | 2.4 | 1.71 |
| A-mem | 14.29 | 11.8 | 12.04 | 10.65 | 3.61 | 2.83 | |
| GAM | 16.05 | 13.24 | 7.81 | 6.69 | 3.40 | 3.37 | |
| MemoryOS | 4.30 | 3.24 | 9.02 | 7.34 | 3.12 | 1.29 | |
| CLAG (Ours) | 20.99 | 17.88 | 15.50 | 14.33 | 22.01 | 17.23 | |
Retrieval Quality
CLAG improves biomedical evidence retrieval over RAG and A-mem by narrowing the search space before fine-grained retrieval.
BioASQ retrieval quality on Qwen3-0.6B.
| Method | E-Prec | E-Recall | E-F1 | R@5 | R@10 | nDCG@10 |
|---|---|---|---|---|---|---|
| RAG | 4.60 | 1.65 | 2.29 | 1.48 | 1.65 | 20.19 |
| A-mem | 4.40 | 1.59 | 2.20 | 1.48 | 1.59 | 21.27 |
| CLAG (Ours) | 33.35 | 32.64 | 25.11 | 25.90 | 32.64 | 56.17 |
Ablation
On BioASQ, agent-driven clustering outperforms geometric clustering strategies that rely only on embedding proximity.
Clustering strategy ablation on BioASQ with Qwen3-0.6B.
| Clustering Strategy | F1 | BLEU-1 |
|---|---|---|
| Cosine-based Clustering | 14.78 | 12.53 |
| K-Means Clustering | 15.64 | 13.36 |
| CLAG (Ours) | 22.01 | 17.23 |
Takeaway
CLAG treats memory organization as part of the agent loop rather than as offline preprocessing. Each cluster becomes a self-contained unit with a topic summary, descriptive tags, and representative memories.
The key idea is to delegate memory routing and organization directly to the agent, letting small language model agents build semantically coherent memory units and retrieve from the right local context. This agent-driven organization is the central reason CLAG improves answer quality without relying on a larger backbone model.
Case Study
Appendix E analyzes a LoCoMo example where relevant memories are buried among unrelated dialogue notes. CLAG first selects the literature-related cluster, then retrieves the evidence needed to answer the question.
Query. Which two mystery novels does Tim particularly enjoy writing about?
Ground Truth. Harry Potter and Game of Thrones
Only CLAG identifies both titles mentioned in the dialogue.
| Method | Prediction | Result |
|---|---|---|
| CLAG (Ours) | Harry Potter and Game of Thrones | Correct ✓ |
| RAG | Not mentioned in the conversation. | Wrong ✕ |
| A-mem | Tim's favorite two mystery novels are not mentioned. | Wrong ✕ |
| GAM | Not mentioned in the conversation. | Wrong ✕ |
| MemoryOS | Tim writes about both fantasy and mystery novels. | Wrong ✕ |
CLAG reduces the search space from 680 to 119 notes before fine-grained retrieval.
| Cluster ID | Profile | Count | Status |
|---|---|---|---|
| 0 | Speaker discusses how books create new worlds | 119 | Selected ✓ |
| 1 | Impact of basketball on community growth | 231 | Pruned ✕ |
| 2 | Experience of meeting teammates after a trip | 330 | Pruned ✕ |
Resources
@article{roh2026clag,
title={CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents},
author={Roh, Taeyun and Jang, Wonjune and Jung, Junha and Kang, Jaewoo},
journal={arXiv preprint arXiv:2603.15421},
year={2026}
}