Findings of ACL 2026

CLAG: Adaptive Memory Organization

Taeyun Roh1, Wonjune Jang2, Junha Jung1,3, Jaewoo Kang1,3

1Korea University · 2Myongji University · 3AIGEN Sciences

CLAG is a memory framework for small language model agents. It replaces a single global memory pool with agent-driven semantic clusters, localized memory evolution, and cluster-aware retrieval so limited-capacity agents receive less distractor context.

Conceptual comparison between global memory systems and CLAG's localized evolution and retrieval.

Problem

Global memory pools become noisy as agents accumulate history.

Existing agent memory systems usually store past observations, actions, tool outputs, and feedback in one global retrieval pool. As the buffer grows, retrieval becomes more likely to surface semantically plausible but task-irrelevant memories, while memory evolution can update records using topic-mixed neighborhoods.

This is especially costly for small language models, which are more sensitive to irrelevant context. CLAG keeps the benefits of self-evolving memory, but constrains updates and retrieval to semantically coherent neighborhoods.

Method

Agent-controlled clustering for long-horizon memory

01

Agentic Routing

New memories are first filtered by vector distance, then an SLM router selects the best semantic cluster using cluster profiles. If similarity is too low, CLAG creates a new cluster.

02

Localized Evolution

Linking, rewriting, and consolidation are performed only inside the routed cluster. This preserves topic consistency and avoids global pairwise comparisons.

03

Two-Stage Retrieval

CLAG first selects relevant clusters from profile summaries and tags, then retrieves fine-grained evidence only from those clusters to suppress distractors.

Overview of CLAG with agentic routing, localized evolution, and two-stage retrieval.

Results

Stronger answer quality on Qwen3-0.6B

On Qwen3-0.6B, CLAG reports the best F1 and BLEU-1 across LoCoMo, HotpotQA, and BioASQ. The largest gain appears on BioASQ, where dense biomedical terminology makes global similarity search prone to distractors.

Final-answer performance on Qwen3-0.6B.

Final-answer performance on Qwen3-0.6B.
Model Method LoCoMo HotpotQA BioASQ
F1 BLEU-1 F1 BLEU-1 F1 BLEU-1
Qwen3-0.6B RAG 12.9 10.39 11.75 11.17 2.4 1.71
A-mem 14.29 11.8 12.04 10.65 3.61 2.83
GAM 16.05 13.24 7.81 6.69 3.40 3.37
MemoryOS 4.30 3.24 9.02 7.34 3.12 1.29
CLAG (Ours) 20.99 17.88 15.50 14.33 22.01 17.23

Retrieval Quality

BioASQ E-F1: 25.11

CLAG improves biomedical evidence retrieval over RAG and A-mem by narrowing the search space before fine-grained retrieval.

BioASQ retrieval quality on Qwen3-0.6B.

BioASQ retrieval quality on Qwen3-0.6B.
Method E-Prec E-Recall E-F1 R@5 R@10 nDCG@10
RAG 4.60 1.65 2.29 1.48 1.65 20.19
A-mem 4.40 1.59 2.20 1.48 1.59 21.27
CLAG (Ours) 33.35 32.64 25.11 25.90 32.64 56.17

Ablation

Agentic clustering beats K-Means

On BioASQ, agent-driven clustering outperforms geometric clustering strategies that rely only on embedding proximity.

Clustering strategy ablation on BioASQ with Qwen3-0.6B.

Clustering strategy ablation on BioASQ with Qwen3-0.6B.
Clustering Strategy F1 BLEU-1
Cosine-based Clustering 14.78 12.53
K-Means Clustering 15.64 13.36
CLAG (Ours) 22.01 17.23

Takeaway

A practical memory layer for limited-capacity agents

CLAG treats memory organization as part of the agent loop rather than as offline preprocessing. Each cluster becomes a self-contained unit with a topic summary, descriptive tags, and representative memories.

The key idea is to delegate memory routing and organization directly to the agent, letting small language model agents build semantically coherent memory units and retrieve from the right local context. This agent-driven organization is the central reason CLAG improves answer quality without relying on a larger backbone model.

Case Study

Why cluster-aware retrieval matters

Appendix E analyzes a LoCoMo example where relevant memories are buried among unrelated dialogue notes. CLAG first selects the literature-related cluster, then retrieves the evidence needed to answer the question.

Query. Which two mystery novels does Tim particularly enjoy writing about?

Ground Truth. Harry Potter and Game of Thrones

Generated responses

Only CLAG identifies both titles mentioned in the dialogue.

Only CLAG identifies both titles mentioned in the dialogue.
Method Prediction Result
CLAG (Ours) Harry Potter and Game of Thrones Correct ✓
RAG Not mentioned in the conversation. Wrong ✕
A-mem Tim's favorite two mystery novels are not mentioned. Wrong ✕
GAM Not mentioned in the conversation. Wrong ✕
MemoryOS Tim writes about both fantasy and mystery novels. Wrong ✕

Semantic clustering and pruning

CLAG reduces the search space from 680 to 119 notes before fine-grained retrieval.

CLAG reduces the search space from 680 to 119 notes before fine-grained retrieval.
Cluster ID Profile Count Status
0 Speaker discusses how books create new worlds 119 Selected ✓
1 Impact of basketball on community growth 231 Pruned ✕
2 Experience of meeting teammates after a trip 330 Pruned ✕

Resources

Citation

@article{roh2026clag,
  title={CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents},
  author={Roh, Taeyun and Jang, Wonjune and Jung, Junha and Kang, Jaewoo},
  journal={arXiv preprint arXiv:2603.15421},
  year={2026}
}