Chris Hay proves LLMs are not just metaphorically like databases — they are physically graph databases. Using Larql, his SQL-like query language for model weights, he queries Google Gemma 3 34B directly, inserts new knowledge (Poseidon as capital of Atlantis) without any training, and compiles the result back to GGUF — proving the FFN is a queryable, writable graph.
SELECT, DESCRIBE, INSERT, SHOW, and graph walk operations directly against model weightsINSERT INTO edges injects new facts at runtime; a balancer prevents the new fact from hijacking related featuresThe video opens with a provocative claim: every LLM is physically a graph database. Not as a metaphor, but literally. The feedforward network layers store knowledge as a graph with real nodes (entities), real edges (features), and real relation labels. That means you can query them, insert into them, and compile them — just like any other database.
Chris introduces Larql (Large Language Model Query Language), paired with a runtime called Larco, which connects to model weights stored in the V-index format. This format decomposes FFN matrices from each transformer layer into a graph structure — a gate vector (when a feature fires) and a down vector (what it contributes to output). He connects to Google Gemma 3 34B: 348,000 features across 34 layers, with 1,785 probe-confirmed features mapped to human-readable representations.
Running DESCRIBE France on the model reveals the three stages of every transformer's forward pass:
As Chris puts it: "The syntax gathers the context, the edges store the knowledge, and the output commits to the token."
Larql supports SQL-like syntax directly against weights:
SELECT * FROM edges WHERE entity = 'France' AND relation = 'borders'
SELECT * FROM edges WHERE entity = 'France' AND relation = 'nationality' LIMIT 5
SELECT * FROM edges NEAREST TO 'France' AT LAYER 26 LIMIT 10
SELECT * FROM features WHERE layer = 25 AND feature = 5067
SELECT * FROM entities LIMIT 20
SHOW RELATIONS
The SHOW RELATIONS command reveals 1,489 probe-confirmed relation labels: manufacturer (76 features), league (60), genre (52), language (46). The model invented a full relational schema from raw text — nobody taught it. Things have makers, places have capitals, people have occupations. The FFN reinvented a relational schema from raw text because that's how the world is structured.
Feature 9348 at layer 26 fires for Australia, Italy, Germany, and Spain — it's not an "Australia feature," it's a "Western nations feature." The model compresses related concepts into shared slots because 10,240 features per layer must encode a far higher-dimensional concept space. The residual stream is 2,560-dimensional; every scalar activation is a shadow of the full representation.
Attention — operating in the full 2,560-dimensional residual stream — is what routes correctly through this polysemantic graph, suppressing irrelevant activations via cosine similarity matching at the query/key level.
The most striking demo: inserting Poseidon as the capital of Atlantis with no training:
INSERT INTO edges VALUES (entity='Atlantis', relation='capital', target='Poseidon')
The pipeline captures the model's residual stream at layer 26 for the canonical prompt, engineers a gate vector from that direction, synthesizes a down vector pointing toward "Poseidon," and installs the gate+down triple into a free feature slot. A balancer scales the vectors so the fact lands at exactly the right strength — strong enough to answer correctly (99.98% confidence) but not so strong it hijacks capital queries for Paris (still 81% correct).
Changes live in a patch overlay at runtime, leaving the base V-index read-only. To make them permanent:
COMPILE CURRENT INTO V-INDEX temp_atlantis.vindex
This uses the MEMIT technique to bake gate+down vectors into canonical weight files. The result exports to safetensors or GGUF, making it compatible with standard model providers. A fresh Larco session connected to the new V-index confirms: capital of Atlantis is Poseidon, no training required.
When Larco runs inference, it's not doing dense matrix multiplications through the FFN — it's doing a KNN graph walk. At each layer: take current residual stream → KNN lookup against gate vectors → fire matching features → accumulate down vectors → next layer. Attention still does QKV projections (matrix math), but the FFN — the knowledge store — is a graph walk. The dense matrix was always just an inefficient encoding of this graph structure.
Chris outlines what this architecture enables:
"The FFN was always a graph. The matrix was just an inefficient way of encoding it."
"Nobody taught the model the schema. The model learned these categories because that is how the world is structured. Things have makers, places have capitals, people have occupations. The FFN reinvented a relational schema from raw text."
"The model is a graph. I'm literally saying the model is a graph. The graph is messy, but the walk is precise, and attention is the thing that gets you through it."
"We've done select statements. We've inserted new knowledge. We've compiled it back down into weights. It's a database."
"One feature can't tell you what the capital of France is. 34 layers of attention weighted features can."