Method

Search was never about humans

Retrieval over typed personal graphs · Retrieval is shape, sized to the reader. The reader is no longer just human.

Search has always been graph-traversal-with-ranking. The human reader was a contingency. The new reader — an agent — has a different attention budget.

Companion to Know Thyself. Scaffold: github.com/parrik/know-thyself-search.

The mirror, again

The first essay opened with Alex catching a model making a confident claim about her on six restated assertions — zero independent episodes. Schema fixed it: typed nodes, provenance triples. Eight months in, her graph has shape — a few hundred nodes, a spine that holds.

Then she pastes the whole graph in, as she has all year, and Claude pulls up short. Three thousand nodes is too much for me to read at once. It worked at 300. At 600. At 1200 with friction. Then it stopped. Graph correct. Reader finite. Retrieval is the bridge.

Four scales of the same shape

Search has always been one shape: find relevant nodes by walking edges, ranked by some distance function.¹ Salton’s 1968 Automatic Information Organization and Retrieval set it; the vector-space-model paper (Salton, Wong, Yang 1975) named the geometry; PageRank (Brin & Page 1998) added link-graph priors. The human at the SERP was a contingency — ten results because more was too much, first-three ranking because attention had a budget, page summaries because a page was the unit a person could absorb. The reader changed; the graph problem didn’t.

Scale 1 — Inverted index. Type two words; ten blue links come back. Underneath: a graph with terms on one side and documents on the other, the edges weighted by how often a term shows up where. The walk is short. Start at the term, follow edges to documents, sort by overlap. Ranking by overlap has names — TF-IDF (rare words count more) and its successor BM25 (same idea, with length normalization). Reader: human. Format: ten ranked links. Lucene at hundreds of millions of docs is still this shape.

Scale 2 — Vector retrieval. The node is no longer a word; it’s an embedding — a point in high-dimensional space where meaning lives as direction. Two points are close if the angle between them is small (cosine distance — the standard similarity metric for embeddings). At scale, the index itself becomes a graph: a small-world layered so a greedy walk from the top hops down to the nearest neighbors in log time. Below ten thousand vectors, brute-force multiplication is faster than the index. Past that, the index earns its keep. Reader: still mostly human, but an LLM is increasingly at the other end. Format: page summaries, with chunks creeping in.

Scale 3 — Typed knowledge graph. A node is no longer a document. It’s a claim. Edges are no longer “links to” — they’re labelled. Grounds. Derives from. Evidences. Contradicts. The labels do retrieval work. They let the walk distinguish I said this five times from independently grounded twice. The query stops being a string and becomes a predicate — a structured request for a kind of node. Reader: a self, or an agent on behalf of one. Format: node plus provenance plus neighborhood.

Scale 4 — AI-native search. The agent doesn’t type. It describes. The query is a sentence shaped like the answer — “Here is a great article about LLM evaluation:” outperforms “LLM evaluation” because the embedding was trained on the way documents get cited. Filtering separates from ranking and runs first; the index throws out the wrong types before scoring the rest. Ranking shifts from popularity to comprehensiveness, recency, type-correctness, provenance-strength. Reader: an agent with a token budget. Format: atomic chunks with provenance — {title, url, score, publishedDate, author, text, highlights[]} — every field stitches into the answer.

The retriever now spawns retrievers. Exa’s Feb–Mar 2026 ships make the shift legible: Exa Instant returns neural results in under 200ms — fast enough to sit inside a tool-call loop — while Exa Deep fans out parallel sub-agents per query, and exa-code maintains a code-example index aimed at hallucination-rate reduction.²

Search is no longer a URL. It’s a tool a model calls. In December 2025, Anthropic donated the Model Context Protocol — MCP, the open standard that lets a model invoke external tools and data sources — to the Linux Foundation; the substrate beneath agent retrieval is now governed as shared infrastructure, not a vendor API.³

All four find relevant nodes by walking edges. What changes is node spec, edge spec, query format, who’s at the other end.

There’s a third axis. Not what something looks like, not what it means — what it is. Turnbull names it: agents query by attribute, and metadata is the retrieval kind that lexical and embedding both miss.⁴

Bounded context turns reward into force.

Why bounded context forces structured memory

Four claims, deep prior backing.⁵

Working memory is bounded — Miller’s 7±2; Cowan’s 4±1.
Institutional decision-making is bounded — Simon’s bounded rationality.
Lossless compression is bounded — Shannon’s floor.
The space-creating operation that doesn’t lose information is factoring — shared structure into named nodes with typed edges. Codd’s relational model.

Stack them. When |K| exceeds C_n, discarding loses information and lossless hits Shannon’s floor. Factoring is graph construction. The bounded reader needs the graph not as decoration but as the only architecture that lets retrieval scale without degrading.

Three substrates share the constraint: biological working memory, institutional decision-making, transformer context windows — the span of text a model can hold in active attention; large now, but degrading as irrelevant content fills them. McCarthy proves it formally for the scientific case, with a corollary aimed at the frontier labs:

Growing C_n directly does not solve the retrieval problem… The efficient path is not to grow the context window but to grow the encoded knowledge accessible via stored adjacency: filling the graph, not the context window.

The race to longer windows — 200K, 1M, 10M — is real progress and a confession the substrate hasn’t been chosen. Self-attention’s O(n²) is the cost of no stored structure. A graph stores dependencies once, walks them many times. The limit isn’t tokens; it’s untyped flatness.

The labs split. Anthropic ships MCP — first reference server is a knowledge-graph CRUD API. OpenAI ships memory — extracted typed claims, not turns. Google ships million-token contexts — the maximalist path McCarthy names inefficient. The academic line — memory streams,⁶ MemGPT,⁷ HippoRAG,⁸ A-Mem⁹ — converges: the graph isn’t a feature; it’s where memory lives.

What the publisher ships

The four scales describe the retriever. The retriever is half the system. Each retrieval generation pushed a corresponding publishing primitive upstream onto the writer.

Retrieval scale	Reader needs	Publisher ships
1 — Inverted index	terms in documents	HTML with on-page text
2 — Vector retrieval	embeddings of meaning	clean prose, semantic HTML, no marketing chrome
3 — Typed knowledge graph	typed claims with provenance	JSON-LD, schema.org, structured citations
4 — AI-native search	atomic chunks with provenance	`llms.txt`, per-essay `.md`, MCP, `/graph.json`

TF-IDF needed publishers to write words. PageRank needed publishers to link. Vector retrieval needed publishers to write declaratively — the page had to look like the answer, because the embedding was trained on the way documents got cited. Each generation pushed cognitive work back upstream — from the search engine to the writer.

Agent-native search continues the trend, harder. Where Scale 2 wanted prose-shaped-like-an-answer, Scale 4 wants the answer typed: claim, evidence, provenance, neighbors, valid-when. The smallest unit of publishing has changed from “a page with words” to “a node in a typed graph.”¹⁰¹¹

Publishers who ship the primitive corresponding to the current retrieval generation get walked by agents. Publishers who don’t — most of the web in 2026 — get summarized into oblivion by a derivative agent that scraped them once and now serves a stale paraphrase. The choice is no longer whether to be indexed. It’s whether to be the source or the source-of-the-source.

What this essay extends

The personal-graph framing — bounded-context applied to a self rather than a science — is what this essay puts down. McCarthy’s necessity arguments run through selection-under-competition: science prunes by what wins under evidence. Personal-memory graphs aren’t under that pressure. No competitor’s posterior, no replication, no external ground truth, fuzzy temporal validity. Three rewrites:

valid_at / confidence-decay. Propositions about persons aren’t permanently valid. Propositions don’t die, they become less true over time. Every claim carries a validity window that decays unless re-grounded. First-class field on every node.

Inverted edge-density. Paper 1 Corollary 3 predicts mature graphs become edge-dense. True for science. False for personal psychology — a forty-year-old’s graph is node-dense with sparse adjacency. “Right node, then walk its neighborhood” beats “connected clique.”

Un-clean action space. K/A inseparability (Paper 3 Corollary 4) presumes crisp β overlap. Personal action doesn’t share one. Schema tolerates K-without-A and A-without-K.

A demo, at the personal scale

The bet is testable. The bound is the reader, not the corpus. Small graph + finite agent = huge graph + finite agent.

Runnable on a laptop, against Alex’s example graph:

git clone github.com/parrik/know-thyself-search
cd know-thyself-search
pip install pyyaml numpy

python embed.py examples/example-graph-extended.yaml
python compare.py "when did the running routine break down"

Three retrieval modes, side-by-side, on the same query:

Query: 'when did the running routine break down'
Index: 87 nodes · backend=tfidf

━━━ MODE A — pure cosine ━━━━━━━━━━━━━━━━━━
  1. 0.296  P01-routine-as-regulation       [overlap]
            Physical routine is load-bearing for Alex's stability
  2. 0.216  E01-child-stability-depends...  [emergent]
            Mira's stability depends on Alex's routine stability
  3. 0.190  N01-isolation-is-early-warning  [novel] [tentative]
            Isolation is an early-warning signal

━━━ MODE B — + type filter (observation) ━━
  1. 0.387  O01-first-three-months          [observation]
            First three months in Chicago — Sep–Nov 2024
  2. 0.121  O02-running-restart-mar2025     [observation]
            Running routine restarted — March 2025

━━━ MODE C — + provenance rerank ━━━━━━━━━━
  1. 0.325  P01-routine-as-regulation       [overlap]
  2. 0.227  E01-child-stability-depends...  [emergent]
  3. 0.171  N01-isolation-is-early-warning  [novel] [tentative]

A finds the theme. B finds the episode. Cosine grabs P01 on word overlap — right frame, wrong answer to when. Type filter retrieves dated episodes. Schema doing work pure embeddings cannot.

C demotes the tentative novel. Provenance reranking knows the novel is one-derivation and the overlap is two-grounded. Attribution ≠ confidence as a retrieval property.

At 87 nodes, none of this needs HNSW — the small-world graph index that makes vector lookup log-time at scale. Brute-force matmul runs in two ms. HNSW kicks in when linear scan stops being free — well past where most personal graphs go.¹² Algorithm scales with problem.

What this opens

Same shape across scales, build once:

Schema: typed nodes carrying claim + attribution + derivation. The know-thyself scaffold adds temporal-validity.
Index: vectors plus typed metadata (know-thyself-search).
Walk: edge-aware traversal — walk_provenance(node_id) returns typed-edge neighborhood (outbound grounded_by / related_to, inbound references) in one call. Shipped Apr 2026.
Surface: MCP server — search_graph / get_node / walk_provenance / list_node_stats over stdio; any MCP client (Claude Code, Claude Desktop, Cursor) queries natively. Shipped Apr 2026.
Next bottleneck: sub-statement chunking. Long observation nodes accumulate dated sub-sections; whole-statement single-vector dilutes new content. Etude queued.

Adjacent work names pieces without the cross-scale claim. Bryk’s Why Google Search Sucks for AI on Scale 4. Lù et al.’s Build the Web for Agents one level up. McCarthy’s open-knowledge-graph on Scale 3. Personal-memory siblings — Mem0, Graphiti, Letta, HippoRAG, A-Mem — converge on typed-node-with-provenance. Frameworks like LangChain / LlamaIndex treat memory as conversation-shaped (buffer, summary, vector-of-turns). Graph-shaped projects treat it as person-shaped. Conversation primitive falls under McCarthy’s Theorem 4 — flat substrates degrade as bounded readers scan them. Person-shaped survives.

Postscript — DeepSeek V4 (Apr 26 2026)

Two days after this essay shipped, DeepSeek released V4. Million-token context via hybrid attention — Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA) — at 10% of V3.2’s KV cache for V4-Pro and 7% for V4-Flash (≈10× and ≈14× less memory respectively).

Four-scales extends one further:

Scale	Nodes	Edges	Walk strategy	Reader
LLM working memory	tokens / latent codes	sparse + hierarchical attention	compression + retrieval-as-attention	the model itself

Graph-traversal-with-ranking internalized one further. The reader is the model.

Run it

The full scaffold is three retrieval modes, ~300 LOC, runnable on a laptop: github.com/parrik/know-thyself-search. Clone, embed Alex’s example graph, watch type-filter and provenance-rerank do work pure cosine cannot.

The loop closes

This is the loop the first essay opened and this one closes. Personal-memory and AI-search-for-agents are the same problem at different scales.

γνῶθι σεαυτόν. Know thyself. The Delphic maxim was offered to visitors before they consulted the oracle. Being legible to the oracle was the precondition for being understood. The oracle’s bandwidth was finite; the visitor’s wasn’t.

The retrieval problem hasn’t changed in two and a half millennia. The reader has.

Agents help when you know what you’re looking for. They don’t help when you don’t. Turnbull’s Apr 28 2026 post sharpens the limit: agents add value on entity-discovery — finding a thing whose shape is named — and add nothing on information-discovery, because if it knew what information was correct, it wouldn’t need search.¹³

The bet is testable.

Same shape, smaller scale — applied to a self. Part III — Security was never about response →

Same shape across scales: inverted index (Lucene, BM25/TF-IDF — terms ↔ documents); vector retrieval (HNSW¹⁴, Pinecone, FAISS — embeddings as nodes, cosine as edges); typed knowledge graph (claims with grounds / derives_from / contradicts edges)¹⁵; AI-native search (Exa — clustered ANN over Matryoshka embeddings, link-prediction-trained, rejected HNSW for sharding/metadata reasons; Bryk: “It would kind of be insane if the same search engine that was optimal for humans would also be optimal for this very different creature.”). What changes: node spec, edge spec, query format, who’s reading. Agents want declarative queries ("Here is a great article about LLM evaluation:" outperforms "LLM evaluation"), atomic chunks with provenance, ranking by comprehensiveness/recency/type-correctness — filter first, then rank. ↩
Exa, Exa Deep and exa-code (2026); MarkTechPost, Exa AI Introduces Exa Instant (Feb 2026). ↩
The New Stack, Model Context Protocol Roadmap 2026 — MCP donated to the Linux Foundation, December 2025. ↩
Doug Turnbull, Metadata: the 3rd kind of retrieval (Apr 21 2026). ↩
Miller 1956; Cowan 2001; Simon’s bounded rationality; Shannon’s source coding theorem; Codd’s relational model. ↩
Park et al., Generative Agents (2023). ↩
Packer et al., MemGPT (2023). ↩
Gutiérrez et al., HippoRAG (2024). ↩
Xu et al., A-Mem (2025). ↩
llmstxt.org — H1 + blockquote + H2-link-list spec for an agent-readable site index. Cursor, Continue, Aider, and several RAG frameworks grep for it in 2026; no major model provider commits to reading it as first-class input as of mid-2026, but adoption is climbing — SE Ranking 2026 survey reports ~10% of 300k domains shipping it. ↩
Anthropic docs ships /llms-full.txt — the entire documentation site concatenated as one Markdown file, for one-paste loading into a model context. Vercel, Mintlify, and most major dev-tool docs converged on this primitive in 2026. The “give me the whole thing” surface for the agent reader. ↩
Pinecone, HNSW — when graph indexes earn their keep over linear scan. ↩
Doug Turnbull, Can agents replace the search stack? (Apr 28 2026). ↩
Malkov & Yashunin, Hierarchical Navigable Small World graphs (2018). ↩
RDF (W3C, 2004); PROV-O (W3C, 2013); Anthropic’s Claude citations API; McCarthy’s open-knowledge-graph on the scientific case. ↩