Chris Hay demonstrates a novel approach to improving LLM math accuracy by injecting "virtual experts" (Python functions) directly into the Mixture of Experts (MoE) routing architecture. Instead of external tool calling, the model's router intercepts arithmetic tasks in early layers and routes them to a Python expert that executes actual code - making the model natively accurate at math while potentially running at half the size.
Chris opens with a striking demo: asking GPT-OSS-20B (a 20 billion parameter model) to calculate 127 x 89. The model confidently answers 11263 - wrong. The correct answer is 11303. This isn't a confused model; it's confident and incorrect.
Modern models like GPT-OSS-20B are Mixture of Experts (MoE) models, unlike older dense models (LLaMA 2, GPT-2, GPT-3) where every parameter is active during inference. In MoE:
The key insight: attention mechanisms already do routing based on context. If attention can route "2 plus 3" to expert 8, why does expert 8 need to be neural? Chris's answer: it doesn't.
"Who says we need a neural expert in the first place? Why can't we just have a tool built into the model?"
Using his Lazarus introspection tool, Chris shows that:
Two implementation methods exist:
In a teaser for an upcoming video, Chris demonstrates expert pruning:
Solution: Add the virtual Python expert back. The half-size model now gets math right while running in significantly less memory.
| Traditional Tool Calling | Virtual Experts |
|---|---|
| External to model | Native to architecture |
| Token generation -> parsing -> tool call -> inject result | Happens within routing layer |
| Must run through all layers | Can bypass remaining layers |
| Extra latency | No additional overhead |
Virtual experts aren't limited to calculations:
"27 x 89. Let's ask a 20 billion parameter model. Wrong. The answer is 11303. It's not confused. It's confident. And it's wrong."
"This isn't a hack for fixing math. It's a new primitive."
"768 experts, 20 billion parameters, and the math is wrong. One virtual expert with Python and it gets it right. And the best thing is the model doesn't know."
| Time | Topic |
|---|---|
| 00:00 | Expert Math Failure |
| 00:13 | Virtual Math Expert Demo |
| 01:12 | Understanding Virtual Experts |
| 02:00 | Math Classification and Routing |
| 03:30 | Math Demos |
| 04:20 | Different Expert Types |
| 05:04 | Halving GPT-OSS-20B |
| 08:23 | Tool Calling vs Virtual Experts |
| 09:20 | Future Thoughts |