Findings of ACL 2026

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Taeyun Roh¹, Wonjune Jang², Junha Jung^1,3, Jaewoo Kang^1,3

¹Korea University · ²Myongji University · ³AIGEN Sciences

CLAG is a memory framework for small language model agents. It replaces a single global memory pool with agent-driven semantic clusters, localized memory evolution, and cluster-aware retrieval so limited-capacity agents receive less distractor context.

arXiv GitHub LinkedIn

Conceptual comparison between global memory systems and CLAG's localized evolution and retrieval.

Problem

Global memory pools become noisy as agents accumulate history.

Existing agent memory systems usually store past observations, actions, tool outputs, and feedback in one global retrieval pool. As the buffer grows, retrieval becomes more likely to surface semantically plausible but task-irrelevant memories, while memory evolution can update records using topic-mixed neighborhoods.

This is especially costly for small language models, which are more sensitive to irrelevant context. CLAG keeps the benefits of self-evolving memory, but constrains updates and retrieval to semantically coherent neighborhoods.

Method

Agent-controlled clustering for long-horizon memory

Agentic Routing

New memories are first filtered by vector distance, then an SLM router selects the best semantic cluster using cluster profiles. If similarity is too low, CLAG creates a new cluster.

Localized Evolution

Linking, rewriting, and consolidation are performed only inside the routed cluster. This preserves topic consistency and avoids global pairwise comparisons.

Two-Stage Retrieval

CLAG first selects relevant clusters from profile summaries and tags, then retrieves fine-grained evidence only from those clusters to suppress distractors.

Results

Stronger answer quality on Qwen3-0.6B

On Qwen3-0.6B, CLAG reports the best F1 and BLEU-1 across LoCoMo, HotpotQA, and BioASQ. The largest gain appears on BioASQ, where dense biomedical terminology makes global similarity search prone to distractors.

Final-answer performance on Qwen3-0.6B.

Final-answer performance on Qwen3-0.6B.
Model	Method	LoCoMo		HotpotQA		BioASQ
Model	Method	F1	BLEU-1	F1	BLEU-1	F1	BLEU-1
Qwen3-0.6B	RAG	12.9	10.39	11.75	11.17	2.4	1.71
	A-mem	14.29	11.8	12.04	10.65	3.61	2.83
	GAM	16.05	13.24	7.81	6.69	3.40	3.37
	MemoryOS	4.30	3.24	9.02	7.34	3.12	1.29
	CLAG (Ours)	20.99	17.88	15.50	14.33	22.01	17.23

Retrieval Quality

BioASQ E-F1: 25.11

CLAG improves biomedical evidence retrieval over RAG and A-mem by narrowing the search space before fine-grained retrieval.

BioASQ retrieval quality on Qwen3-0.6B.

BioASQ retrieval quality on Qwen3-0.6B.
Method	E-Prec	E-Recall	E-F1	R@5	R@10	nDCG@10
RAG	4.60	1.65	2.29	1.48	1.65	20.19
A-mem	4.40	1.59	2.20	1.48	1.59	21.27
CLAG (Ours)	33.35	32.64	25.11	25.90	32.64	56.17

Ablation

Agentic clustering beats K-Means

On BioASQ, agent-driven clustering outperforms geometric clustering strategies that rely only on embedding proximity.

Clustering strategy ablation on BioASQ with Qwen3-0.6B.

Clustering strategy ablation on BioASQ with Qwen3-0.6B.
Clustering Strategy	F1	BLEU-1
Cosine-based Clustering	14.78	12.53
K-Means Clustering	15.64	13.36
CLAG (Ours)	22.01	17.23

Takeaway

A practical memory layer for limited-capacity agents

CLAG treats memory organization as part of the agent loop rather than as offline preprocessing. Each cluster becomes a self-contained unit with a topic summary, descriptive tags, and representative memories.

The key idea is to delegate memory routing and organization directly to the agent, letting small language model agents build semantically coherent memory units and retrieve from the right local context. This agent-driven organization is the central reason CLAG improves answer quality without relying on a larger backbone model.

Case Study

Why cluster-aware retrieval matters

Appendix E analyzes a LoCoMo example where relevant memories are buried among unrelated dialogue notes. CLAG first selects the literature-related cluster, then retrieves the evidence needed to answer the question.

Query. Which two mystery novels does Tim particularly enjoy writing about?

Ground Truth. Harry Potter and Game of Thrones

Generated responses

Only CLAG identifies both titles mentioned in the dialogue.

Only CLAG identifies both titles mentioned in the dialogue.
Method	Prediction	Result
CLAG (Ours)	Harry Potter and Game of Thrones	Correct ✓
RAG	Not mentioned in the conversation.	Wrong ✕
A-mem	Tim's favorite two mystery novels are not mentioned.	Wrong ✕
GAM	Not mentioned in the conversation.	Wrong ✕
MemoryOS	Tim writes about both fantasy and mystery novels.	Wrong ✕

Semantic clustering and pruning

CLAG reduces the search space from 680 to 119 notes before fine-grained retrieval.

CLAG reduces the search space from 680 to 119 notes before fine-grained retrieval.
Cluster ID	Profile	Count	Status
0	Speaker discusses how books create new worlds	119	Selected ✓
1	Impact of basketball on community growth	231	Pruned ✕
2	Experience of meeting teammates after a trip	330	Pruned ✕

Resources

Citation

@article{roh2026clag,
  title={CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents},
  author={Roh, Taeyun and Jang, Wonjune and Jung, Junha and Kang, Jaewoo},
  journal={arXiv preprint arXiv:2603.15421},
  year={2026}
}