May 19, 2026architecture8 min read
Semantic Similarity Is Not Memory
A research-driven exploration of why semantic similarity alone is insufficient for AI memory systems, and how retrieval quality depends on ranking policies, recent context, hybrid retrieval, and adaptive query-aware strategies.
Semantic Similarity Is Not Memory
Overview
One of the easiest mistakes to make when building AI memory systems is assuming that semantic similarity equals memory.
At first, vector retrieval feels almost magical.
You embed previous conversations, compute similarity scores, and suddenly the system can "remember" related information from earlier interactions.
But after enough experiments, a more uncomfortable reality appears:
high similarity does not guarantee that the required knowledge is actually present.
This became one of the central findings during my Agent Memory Lab experiments.
The project started as a small semantic retrieval exploration and gradually evolved into a broader investigation of:
- retrieval drift
- ranking systems
- summary memory compression
- project-aware retrieval
- hybrid retrieval
- adaptive retrieval policies
The biggest lesson was simple:
semantic similarity is useful, but it is not memory.
The Initial Assumption
The first version of the retrieval system was straightforward:
query
↓
embedding
↓
cosine similarity
↓
top matching memories
For direct conceptual queries, this worked surprisingly well.
Example:
How can I improve AI memory retrieval systems?
The retrieval system successfully returned memories related to:
- memory ranking
- semantic retrieval
- retrieval architecture
- context selection
At first glance, the system appeared to work.
But small failures started to reveal larger problems.
Retrieval Drift
One of the earliest observations was semantic drift.
The retrieval system often returned conceptually adjacent memories instead of the intended ones.
For example:
How should I continue the system?
This query was intentionally ambiguous.
The retrieval system responded with a mixture of:
- MeOS planning memories
- career positioning memories
- writing ideas
- general systems-thinking memories
The system understood that the query was related to "systems", but it could not determine which system the conversation was referring to.
This exposed an important limitation:
semantic retrieval alone cannot maintain conversational continuity.
The embeddings captured conceptual proximity, but not active conversational state.
Why Recent Context Matters
To test this, I added recent conversational context before retrieval:
We have been discussing MeOS memory architecture and planning systems.
Then I retried the same query:
How should I continue the system?
The retrieval behavior changed dramatically.
Instead of drifting into unrelated memories, the system consistently returned:
- MeOS kernel planning
- memory architecture
- orchestration
- scope-control decisions
This revealed another important insight:
recent conversational state stabilizes retrieval.
Modern AI systems do not rely only on semantic memory.
They also rely on:
- short-term conversational state
- active project awareness
- ranking policies
- context prioritization
Without those layers, semantic retrieval becomes unstable.
Summary Memory Compression
The next experiment explored summary-based memory compression.
This is a common strategy in long-running AI systems.
Instead of storing full conversation history, the system compresses older interactions into summaries.
Example summary:
MeOS is an AI operating system project focused on planning, memory, and agent orchestration.
At first, the summary looked useful.
It preserved the high-level meaning of the project.
But then I tested a factual query:
What was the original name of MeOS?
The retrieval system still produced a relatively high similarity score.
However, the summary did not actually contain the answer.
The correct answer was:
YigitOS
This became one of the most important findings in the lab.
similarity creates the illusion of memory coverage.
The embedding space suggested that the memory was relevant, but the factual information itself had already been lost during compression.
This is a dangerous failure mode.
A retrieval system can appear intelligent while silently losing critical details.
Retrieval Is a Ranking Problem
As the experiments continued, another pattern became obvious.
The retrieval problem was not just about embeddings.
It was about ranking.
The system gradually evolved from:
semantic similarity only
to:
semantic similarity
+ importance
+ recency
+ project relevance
This improved retrieval quality significantly.
Recent project memories became easier to retrieve.
Active project workflows became more stable.
But new problems emerged.
Large conceptual memories often suppressed smaller factual memories.
Broad architectural memories dominated the ranking space because they:
- contained more overlapping concepts
- had higher importance
- matched active project tags
- had stronger semantic density
This created a new failure case:
exact factual memories became difficult to retrieve.
Hybrid Retrieval
To address factual retrieval failures, I introduced a simple hybrid retrieval strategy.
Instead of relying only on semantic similarity, the system also used keyword overlap.
The ranking pipeline became:
semantic similarity
+ keyword overlap
+ importance
+ recency
+ project relevance
This immediately improved factual retrieval.
The MeOS naming memory finally appeared inside the top retrieval results.
But even then, broader conceptual memories still ranked above it.
Why?
Because conceptual memories carried stronger overall ranking signals.
This exposed another important idea:
factual memory and conceptual memory compete differently inside retrieval systems.
Semantic retrieval is naturally better at concepts than exact lookup.
This is one reason why many production RAG systems combine:
- vector search
- keyword search
- metadata lookup
- entity indexing
instead of relying only on embeddings.
Query-Aware Retrieval Policies
The final major experiment introduced query-type-aware retrieval policies.
Instead of using a single universal ranking strategy, the system classified queries into:
- factual
- conceptual
- planning
Each query type used different ranking priorities.
Examples:
Factual queries
What was the original name of MeOS?
Focused more heavily on:
- keyword overlap
- exact matches
- entity relevance
Planning queries
How should I continue MeOS?
Focused more heavily on:
- recent context
- active project continuity
- recency weighting
Conceptual queries
Why is semantic similarity not enough?
Focused more heavily on:
- semantic similarity
- conceptual proximity
- architecture discussions
This adaptive approach improved retrieval quality significantly.
The system was no longer treating every query as the same retrieval problem.
That turned out to be a major architectural shift.
The Real Lesson
At the beginning of the lab, I thought memory retrieval was mostly about embeddings.
After multiple iterations, the problem looked very different.
The real challenge was:
- ranking
- context selection
- retrieval policy design
- continuity management
- balancing factual and conceptual memory
Embeddings were only one layer.
The larger system behavior depended on how memories were selected, prioritized, compressed, and injected back into context.
This is why:
semantic similarity is not memory.
Memory systems are ultimately orchestration systems.
Closing Thoughts
The most interesting part of the lab was not building retrieval scripts.
It was watching small retrieval failures slowly evolve into architectural questions.
Questions like:
- Should all memories compete in the same ranking space?
- Should factual queries use different retrieval policies?
- How much should recency influence ranking?
- When does summary compression become dangerous?
- How much conversational state is required for continuity?
Those questions matter much more than simply computing embeddings.
The lab is still exploratory.
The goal is not building a production memory engine.
The goal is understanding the tradeoffs behind retrieval systems used in AI agents.