Back to Videos

LLMs Are Databases - So Query Them

Channel Chris Hay
Date April 13, 2026
Duration 29:02
LLM Internals Graph Database Model Editing Mechanistic Interpretability Developer Tools
TL;DR

Chris Hay proves LLMs are not just metaphorically like databases — they are physically graph databases. Using Larql, his SQL-like query language for model weights, he queries Google Gemma 3 34B directly, inserts new knowledge (Poseidon as capital of Atlantis) without any training, and compiles the result back to GGUF — proving the FFN is a queryable, writable graph.

Key Takeaways

Summary

The Core Claim: LLMs Are Graph Databases

The video opens with a provocative claim: every LLM is physically a graph database. Not as a metaphor, but literally. The feedforward network layers store knowledge as a graph with real nodes (entities), real edges (features), and real relation labels. That means you can query them, insert into them, and compile them — just like any other database.

Larql and the V-Index Format

Chris introduces Larql (Large Language Model Query Language), paired with a runtime called Larco, which connects to model weights stored in the V-index format. This format decomposes FFN matrices from each transformer layer into a graph structure — a gate vector (when a feature fires) and a down vector (what it contributes to output). He connects to Google Gemma 3 34B: 348,000 features across 34 layers, with 1,785 probe-confirmed features mapped to human-readable representations.

Three-Band Architecture in Every Transformer

Running DESCRIBE France on the model reveals the three stages of every transformer's forward pass:

As Chris puts it: "The syntax gathers the context, the edges store the knowledge, and the output commits to the token."

Querying the Graph

Larql supports SQL-like syntax directly against weights:

SELECT * FROM edges WHERE entity = 'France' AND relation = 'borders'
SELECT * FROM edges WHERE entity = 'France' AND relation = 'nationality' LIMIT 5
SELECT * FROM edges NEAREST TO 'France' AT LAYER 26 LIMIT 10
SELECT * FROM features WHERE layer = 25 AND feature = 5067
SELECT * FROM entities LIMIT 20
SHOW RELATIONS

The SHOW RELATIONS command reveals 1,489 probe-confirmed relation labels: manufacturer (76 features), league (60), genre (52), language (46). The model invented a full relational schema from raw text — nobody taught it. Things have makers, places have capitals, people have occupations. The FFN reinvented a relational schema from raw text because that's how the world is structured.

Polysemanticity: A Feature, Not a Bug

Feature 9348 at layer 26 fires for Australia, Italy, Germany, and Spain — it's not an "Australia feature," it's a "Western nations feature." The model compresses related concepts into shared slots because 10,240 features per layer must encode a far higher-dimensional concept space. The residual stream is 2,560-dimensional; every scalar activation is a shadow of the full representation.

Attention — operating in the full 2,560-dimensional residual stream — is what routes correctly through this polysemantic graph, suppressing irrelevant activations via cosine similarity matching at the query/key level.

Inserting Knowledge Without Training

The most striking demo: inserting Poseidon as the capital of Atlantis with no training:

INSERT INTO edges VALUES (entity='Atlantis', relation='capital', target='Poseidon')

The pipeline captures the model's residual stream at layer 26 for the canonical prompt, engineers a gate vector from that direction, synthesizes a down vector pointing toward "Poseidon," and installs the gate+down triple into a free feature slot. A balancer scales the vectors so the fact lands at exactly the right strength — strong enough to answer correctly (99.98% confidence) but not so strong it hijacks capital queries for Paris (still 81% correct).

Compiling Back to Weights

Changes live in a patch overlay at runtime, leaving the base V-index read-only. To make them permanent:

COMPILE CURRENT INTO V-INDEX temp_atlantis.vindex

This uses the MEMIT technique to bake gate+down vectors into canonical weight files. The result exports to safetensors or GGUF, making it compatible with standard model providers. A fresh Larco session connected to the new V-index confirms: capital of Atlantis is Poseidon, no training required.

Inference as a KNN Graph Walk

When Larco runs inference, it's not doing dense matrix multiplications through the FFN — it's doing a KNN graph walk. At each layer: take current residual stream → KNN lookup against gate vectors → fire matching features → accumulate down vectors → next layer. Attention still does QKV projections (matrix math), but the FFN — the knowledge store — is a graph walk. The dense matrix was always just an inefficient encoding of this graph structure.

Future Implications

Chris outlines what this architecture enables:

Notable Quotes

"The FFN was always a graph. The matrix was just an inefficient way of encoding it."

"Nobody taught the model the schema. The model learned these categories because that is how the world is structured. Things have makers, places have capitals, people have occupations. The FFN reinvented a relational schema from raw text."

"The model is a graph. I'm literally saying the model is a graph. The graph is messy, but the walk is precise, and attention is the thing that gets you through it."

"We've done select statements. We've inserted new knowledge. We've compiled it back down into weights. It's a database."

"One feature can't tell you what the capital of France is. 34 layers of attention weighted features can."

References

Tools Mentioned

Concepts Covered