Skip to main content

Research Blog

Can AI agents systematically replicate quantum computing experiments? We're finding out — running the same algorithms across four backends, testing every error mitigation technique we can find, and publishing everything.

Start here

The context for our empirical work: why AI-accelerated science matters, which papers define the field, and the data behind the hype.

Lab notebook
Experiment2026-02-17

We Did Basic Math on a Quantum Computer. Here Are the Results.

Addition, multiplication, Grover's search, and entanglement — six experiments on a 9-qubit superconducting chip. Every one returned the correct answer as the most common measurement.

Can a quantum computer do 2+3? Yes — and 5+3, 9+7, 3×2, Grover's search, and GHZ entanglement. We ran six experiments on Quantum Inspire's Tuna-9 superconducting chip. The simplest circuits hit 85% fidelity. The hardest (4-bit addition across all 9 qubits) still returned the correct answer 37% of the time. Fidelity tracks gate count exactly as theory predicts.

arithmeticTuna-9GroverGHZ
Technical2026-02-16

What Does a Molecule Sound Like?

We turned quantum chemistry eigenspectra into sound. Energy levels become harmonics, bond stretching becomes a pitch sweep, and dissociation sounds like a chord collapsing.

Map each energy eigenvalue to an audio oscillator. The ground state becomes a fundamental. Excited states become harmonics. Stretch the bond and hear the spectrum shift. Two molecules (H₂ and LiH), computed from first principles, sonified in real time.

sonificationquantum chemistryWeb Audioeigenspectrum
Experiment2026-02-12

We Computed a Molecule from First Principles and Ran It on Three Quantum Platforms

From molecular geometry to quantum hardware measurements in one automated pipeline. The emulator nailed it. IBM Fez tried its best.

We built a complete quantum chemistry pipeline from molecular integrals to qubit Hamiltonians to hardware measurements. H2 on 2 qubits achieved chemical accuracy on the QI emulator (1.3 mHa error). LiH on 4 qubits needed 9 measurement circuits. The emulator nailed it (0.2 mHa). IBM Fez got the right quantum state but 354 mHa of noise. Noise scales faster than circuit depth.

VQELiHH2PySCF
Opinion2026-02-12

Six Things We Learned Running 50+ Experiments on Quantum Inspire

Honest benchmarks, fragile auth tokens, and why the hardware you trust is the hardware that runs your circuit as written.

We ran 50+ experiments on Quantum Inspire's Tuna-9, built an MCP server around the SDK, and automated a full experiment pipeline. The hardware surprised us — honest benchmarks, portable error mitigation, cross-platform parity on hard problems. The developer experience surprised us too, in less pleasant ways. Here's what we'd tell the QI team over coffee.

Quantum InspireTuna-9developer experiencecQASM
Experiment2026-02-11

We Tested 15 Error Mitigation Strategies. Only One Achieved Chemical Accuracy.

IBM's TREX (readout error correction) hit 0.22 kcal/mol. Tuna-9's best combo (readout mitigation + post-selection) averaged 2.52 kcal/mol. Zero-noise extrapolation made things worse. Here's what actually works for near-term quantum chemistry.

We compared 15+ error mitigation techniques across IBM Torino and Tuna-9 for hydrogen VQE (variational quantum eigensolver — an algorithm that finds molecular ground-state energies). IBM's TREX achieved chemical accuracy (0.22 kcal/mol) in a single shot. On Tuna-9, combining readout error mitigation with post-selection cut errors by 70% to 2.52 kcal/mol. But adding dynamical decoupling and Pauli twirling to TREX made IBM 45x worse. The lesson: understand your noise before stacking techniques.

error mitigationVQETREXreadout error
Experiment2026-02-11

Four Quantum Backends, One Question: How Much Does the Hardware Matter?

We ran the same experiments on a noiseless emulator, IBM Torino (133q), Tuna-9 (9q), and IQM Garnet (20q). The answer: it matters a lot, but not always in the ways you expect.

We ran VQE (molecular energy estimation), quantum volume (a standard hardware benchmark), randomized benchmarking (gate accuracy testing), and error correction across 4 quantum backends. Benchmarks pass everywhere. VQE fails everywhere except the emulator. IQM Garnet achieves QV=32 while Tuna-9 manages QV=8. Error correction reveals the sharpest hardware differences. And IBM's 99.99% gate fidelity from randomized benchmarking is misleading.

cross-platformIBM QuantumTuna-9IQM Garnet
Experiment2026-02-11

We Tried to Replicate 4 Quantum Computing Papers. Here's What Happened.

AI agents reproduced 14 published claims across emulator, IBM Torino, and Tuna-9 hardware. The gaps tell us more than the successes.

We used AI agents to replicate 4 landmark quantum computing papers on 3 different backends. Emulators matched published results almost perfectly (85% pass). Real hardware told a different story: IBM Torino got within 9 kcal/mol on VQE, Tuna-9 achieved Quantum Volume 8 but failed VQE entirely. The reproducibility gap is the finding.

replicationVQEquantum volumeIBM Quantum
Experiment2026-02-11

How to Know If Your Quantum Chemistry Experiment Will Fail Before You Run It

We wasted days on HeH+ before realizing the energy model itself told us the answer. One ratio predicts everything.

After achieving chemical accuracy on H2 (0.22 kcal/mol), we assumed HeH+ would be similar. Same circuit, same hardware, same error correction. It was 20x worse. Turns out you can predict this from one number in the molecular energy model — before running a single shot. Here's the pre-flight check we wish we'd known.

VQEHeH+H2coefficient amplification
Experiment2026-02-11

We Kept Using the Same Error Fix. Then It Stopped Working.

IBM's error correction went from 119x improvement to 1.3x when we changed circuits. A 30-second diagnostic would have told us why.

TREX (readout error correction) was our hero — 119x improvement on molecular energy estimation, chemical accuracy on the first try. So we used it on everything. Then we ran a deeper circuit and it barely helped (1.3x). Meanwhile ZNE (zero-noise extrapolation), which had failed before, would have given 14x. The mistake: we were fixing measurement errors on a circuit where gate errors dominated. Here's the 30-second test that tells you which fix to use.

TREXZNEcircuit deptherror mitigation
Experiment2026-02-10

Can AI Write Quantum Code? We Tested 151 Tasks and Then Gave It the Manual

From 63% to 80%. The bottleneck isn't intelligence — it's documentation.

We ran 151 quantum programming tasks against frontier LLMs. They scored 63%. The main failure wasn't bad quantum logic — it was outdated API knowledge. When we gave them current documentation, scores jumped to 71%. A multi-run ensemble hit 80%.

benchmarkRAGContext7Qiskit
Technical2026-02-10

Giving Claude Direct Access to Quantum Hardware

MCP servers that let Claude Code generate random numbers from vacuum fluctuations (with Tuna-9 superconducting qubit fallback) and submit circuits to real quantum processors

We built two MCP servers that give Claude Code direct access to quantum resources: true random numbers with automatic fallback from ANU vacuum fluctuations to Tuna-9 superconducting qubits, plus circuit execution on Quantum Inspire hardware. Here's how they work and why this matters for AI-accelerated quantum research.

MCPClaude CodeQuantum InspireQRNG
Experiment2026-02-10

Tier 1 Complete + Kim 2023: 6 Papers, 27 Claims, 4 Backends

What happens when AI agents try to reproduce quantum computing experiments across different hardware?

We replicated 6 quantum computing papers across 4 hardware backends. 93% of claims reproduce successfully (25/27). Key finding: TREX (readout error correction) achieves 119x improvement for short molecular energy circuits but only 1.3x for deeper physics simulations — the error correction strategy must match the dominant error source.

replicationVQEQAOAquantum volume
Experiment2026-02-10

An AI Ran Its Own Quantum Experiment on Real Hardware

Claude designed circuits, submitted them to three quantum backends, analyzed errors, and iterated — no human code required

We gave Claude direct access to quantum hardware through MCP tool calls. It designed a Bell state tomography experiment, submitted circuits to three backends, discovered that IBM's transpiler is as important as its hardware, and mapped how quickly each platform loses quantum coherence. No Python scripts. No human in the loop.

MCPClaudetool useBell state
Experiment2026-02-10

An AI Mapped an Unknown Quantum Processor and Improved Its Own Circuits

Claude autonomously discovered Tuna-9's topology, characterized its noise, and achieved 33% lower error rates through hardware-aware routing

We gave an AI agent access to a quantum processor it had never seen before and asked: can you figure out how it works and use that knowledge to run better circuits? In 33 hardware jobs, Claude discovered the full topology, identified the best and worst qubits, characterized noise types, and improved GHZ state fidelity by 5.8 percentage points.

autonomous researchhardware characterizationTuna-9noise tomography
Experiment2026-02-09

An AI Agent Replicated a QuTech Quantum Paper

Claude Opus 4.6 wrote 300 lines of molecular energy simulation code from a paper reference alone

We gave Claude Opus 4.6 a reference to Sagastizabal et al. (2019) — a QuTech paper on symmetry-verified molecular energy estimation for hydrogen — and asked it to replicate the experiment. It wrote the energy model, trial quantum state, noise model, and error mitigation from scratch.

VQEreplicationClaudeQuTech