ATOMELM

Model Architecture

ATOME-60K

Engine demo

ATOME-944K

Trained model

Parameters

Temperature0.70
Max Tokens4096
Top-p0.50

Telemetry

Tokens142.8
Time0.33 sec
Tokens/sec121.9
Per-Layer Router Entropy
L0 μ=1.070 |
|
L1 μ=0.638 |
|
L2 μ=0.544 |
|
L3 μ=1.023 |
|
L4 μ=0.486 |
|
L5 μ=0.735 |
|
L6 μ=0.046 |
|
L7 μ=0.067 |
|
io/atome/cli/inference.sh
Active Stream

Imagine you have two identical coins. Normally, flipping one coin has no effect on the other. They are independent.

Quantum entanglement is like having two magical coins. If you flip one and it lands on Heads, you instantly know the other coin will also be Heads, even if it is on the other side of the universe.

They are no longer separate objects; they share a single "state." Measuring one instantly determines the state of the other, defying our everyday understanding of space and time.

What you're seeing: Both models run on the same C-based inference engine and share the same architecture. The 944K-parameter model has enough capacity to generate coherent children's-story text with recognizable structure and meaning. The 60K-parameter demo model is intentionally tiny, so it produces text that resembles English but lacks consistent coherence. The difference is not the engine itself, but the amount of information encoded in the trained weights.

The distribution of routing weights shows how strongly each block prefers one sub-network over the others. Lower entropy indicates a stronger preference, while higher entropy indicates greater uncertainty in expert selection.