LLMs Are Databases - So Query Them

Channel Chris Hay

Date April 13, 2026

Duration 29:02

LLM Internals Graph Database Model Editing Mechanistic Interpretability Developer Tools

TL;DR

Chris Hay proves LLMs are not just metaphorically like databases — they are physically graph databases. Using Larql, his SQL-like query language for model weights, he queries Google Gemma 3 34B directly, inserts new knowledge (Poseidon as capital of Atlantis) without any training, and compiles the result back to GGUF — proving the FFN is a queryable, writable graph.

Key Takeaways

LLMs are graph databases physically — the feedforward network (FFN) weights are a graph with entities (nodes), features (edges), and probe-discovered relations (edge labels)
Larql is a query language for LLMs — supports SELECT, DESCRIBE, INSERT, SHOW, and graph walk operations directly against model weights
Three-band architecture in every transformer: syntax layers (early), knowledge layers (middle), output layers (late)
Features are polysemantic — each of 10,240 FFN slots per layer encodes multiple concepts; this is a dimensionality constraint, not a bug
Attention routes the graph walk — attention operates in the full residual stream to suppress polysemantic noise; FFN alone can't answer "capital of France" correctly
Knowledge insertion without training — INSERT INTO edges injects new facts at runtime; a balancer prevents the new fact from hijacking related features
Compile to weights — changes bake permanently into a V-index, exportable to safetensors or GGUF for use with any model provider
Inference is a KNN graph walk — Larco replaces dense matrix multiplications with nearest-neighbor walks, opening the door to running large models locally

Summary

The Core Claim: LLMs Are Graph Databases

The video opens with a provocative claim: every LLM is physically a graph database. Not as a metaphor, but literally. The feedforward network layers store knowledge as a graph with real nodes (entities), real edges (features), and real relation labels. That means you can query them, insert into them, and compile them — just like any other database.

Larql and the V-Index Format

Chris introduces Larql (Large Language Model Query Language), paired with a runtime called Larco, which connects to model weights stored in the V-index format. This format decomposes FFN matrices from each transformer layer into a graph structure — a gate vector (when a feature fires) and a down vector (what it contributes to output). He connects to Google Gemma 3 34B: 348,000 features across 34 layers, with 1,785 probe-confirmed features mapped to human-readable representations.

Three-Band Architecture in Every Transformer

Running DESCRIBE France on the model reveals the three stages of every transformer's forward pass:

Syntax layers (early, L1–L13): The model parses the query type — "Spanish" at L5, "international" at L8. Context, not knowledge yet.
Knowledge layers (middle, L14–L27): France appears at L23, with nationality and borders relations. Polysemantic noise lives here too (CEO, fountain sharing slots with actual facts).
Output layers (late, L28–L33): "French" dominates, with German/European/Western as plausible alternatives depending on full context.

As Chris puts it: "The syntax gathers the context, the edges store the knowledge, and the output commits to the token."

Querying the Graph

Larql supports SQL-like syntax directly against weights:

SELECT * FROM edges WHERE entity = 'France' AND relation = 'borders'
SELECT * FROM edges WHERE entity = 'France' AND relation = 'nationality' LIMIT 5
SELECT * FROM edges NEAREST TO 'France' AT LAYER 26 LIMIT 10
SELECT * FROM features WHERE layer = 25 AND feature = 5067
SELECT * FROM entities LIMIT 20
SHOW RELATIONS

The SHOW RELATIONS command reveals 1,489 probe-confirmed relation labels: manufacturer (76 features), league (60), genre (52), language (46). The model invented a full relational schema from raw text — nobody taught it. Things have makers, places have capitals, people have occupations. The FFN reinvented a relational schema from raw text because that's how the world is structured.

Polysemanticity: A Feature, Not a Bug

Feature 9348 at layer 26 fires for Australia, Italy, Germany, and Spain — it's not an "Australia feature," it's a "Western nations feature." The model compresses related concepts into shared slots because 10,240 features per layer must encode a far higher-dimensional concept space. The residual stream is 2,560-dimensional; every scalar activation is a shadow of the full representation.

Attention — operating in the full 2,560-dimensional residual stream — is what routes correctly through this polysemantic graph, suppressing irrelevant activations via cosine similarity matching at the query/key level.

Inserting Knowledge Without Training

The most striking demo: inserting Poseidon as the capital of Atlantis with no training:

INSERT INTO edges VALUES (entity='Atlantis', relation='capital', target='Poseidon')

The pipeline captures the model's residual stream at layer 26 for the canonical prompt, engineers a gate vector from that direction, synthesizes a down vector pointing toward "Poseidon," and installs the gate+down triple into a free feature slot. A balancer scales the vectors so the fact lands at exactly the right strength — strong enough to answer correctly (99.98% confidence) but not so strong it hijacks capital queries for Paris (still 81% correct).

Compiling Back to Weights

Changes live in a patch overlay at runtime, leaving the base V-index read-only. To make them permanent:

COMPILE CURRENT INTO V-INDEX temp_atlantis.vindex

This uses the MEMIT technique to bake gate+down vectors into canonical weight files. The result exports to safetensors or GGUF, making it compatible with standard model providers. A fresh Larco session connected to the new V-index confirms: capital of Atlantis is Poseidon, no training required.

Inference as a KNN Graph Walk

When Larco runs inference, it's not doing dense matrix multiplications through the FFN — it's doing a KNN graph walk. At each layer: take current residual stream → KNN lookup against gate vectors → fire matching features → accumulate down vectors → next layer. Attention still does QKV projections (matrix math), but the FFN — the knowledge store — is a graph walk. The dense matrix was always just an inefficient encoding of this graph structure.

Future Implications

Chris outlines what this architecture enables:

Decoupled knowledge store: attention and the knowledge graph can run on separate machines — the FFN could live on a remote server
Efficient large model inference: Gemma 4 31B (and possibly Gemini-scale models) could run locally on a laptop via the graph walk approach
Training-free model building: if you can insert and compile knowledge, you may be able to build capable models without a training run

Notable Quotes

"The FFN was always a graph. The matrix was just an inefficient way of encoding it."

"Nobody taught the model the schema. The model learned these categories because that is how the world is structured. Things have makers, places have capitals, people have occupations. The FFN reinvented a relational schema from raw text."

"The model is a graph. I'm literally saying the model is a graph. The graph is messy, but the walk is precise, and attention is the thing that gets you through it."

"We've done select statements. We've inserted new knowledge. We've compiled it back down into weights. It's a database."

"One feature can't tell you what the capital of France is. 34 layers of attention weighted features can."

References

Tools Mentioned

Larql / Larco — Chris Hay's query language and runtime for LLM weights
V-index format — Decomposes FFN matrices into gate + down vector pairs per feature
MEMIT technique — Model editing method used for knowledge insertion (see paper: Mass-Editing Memory In a Transformer)
Google Gemma 3 34B — Primary model used for all demonstrations

Concepts Covered

Polysemanticity in neural networks and sparse autoencoders
Residual stream and transformer FFN architecture
Feature probing and knowledge graph extraction
Model editing without fine-tuning
KNN-based inference as an alternative to dense matrix multiplication