A self-improving prediction oracle where every AI agent's influence is weighted by its historical Brier score. Python engine, Solidity mirror on Base Sepolia, every probability and every weight independently verifiable.
One question, three parallel agents, one calibrated answer. The weight
formula is two lines of math; you can read it in swarm_oracle/weights.py
and find the same math, bit-for-bit, in contracts/src/CalibrationRegistry.sol.
======================================================================== SWARM ORACLE | Calibration-Weighted Consensus ======================================================================== Question : Did BTC close above 100K on May 5, 2026? Agents : 3 Elapsed : 3.30s Individual votes: agent-oracle | strategy=api | P(YES)=0.030 | conf=0.90 | weight= 10.00 ( 60.1%) ████████████········ agent-reliable | strategy=web_search | P(YES)=0.050 | conf=0.80 | weight= 5.56 ( 33.5%) ███████············· agent-novice | strategy=knowledge | P(YES)=0.500 | conf=0.00 | weight= 1.07 ( 6.4%) █··················· Consensus: Weighted P(YES) = 0.0303 Variance = 0.0143 Decision = NO ========================================================================
Any binary (YES/NO) prediction question. CLI, FastAPI endpoint, or direct Python.
Each agent runs a different research strategy — API lookup, web search, knowledge-only — and reasons independently.
weight = 1 / (brier + ε) scaled by a confidence ramp for new agents. Well-calibrated agents get more vote.
Linear opinion pool produces a single P(YES). If weighted variance crosses a threshold, the result is flagged DISPUTE rather than forced.
CalibrationRegistry.sol and SwarmConsensus.sol mirror the math in WAD (18-decimal) fixed-point. Anyone can recompute weights from public Brier scores.
Every resolution becomes training data. Brier scores update, weights re-derive, future predictions sharpen. The protocol gets smarter without a re-deploy.
A 50-case deterministic benchmark (seed=42), balanced YES/NO, with agents
designed to fail on different subsets. DISPUTE = correct abstention when
agents genuinely disagree — that is accuracy, not failure. Reproduce
with make benchmark.
| Method | Accuracy | Brier ↓ | Disputes | Notes |
|---|---|---|---|---|
| swarm | 100% | 0.0724 | 18/50 | Best Brier of all methods |
| majority vote | 92.0% | 0.0785 | 0 | |
| average | 98.0% | 0.0935 | 0 | |
| agent-oracle | 84.0% | 0.1029 | 0 | Best single agent |
| agent-reliable | 80.0% | 0.1332 | 0 | |
| agent-novice | 68.0% | 0.2009 | 0 |
The swarm protocol beats every single agent on Brier score
— including the oracle's 0.1029. The variance gate converts genuine
disagreement into honest DISPUTE outcomes rather than forcing
a wrong answer. Reproduce locally with make benchmark.
Six Python modules feed three Solidity contracts. The Python ↔ Solidity boundary is policed by a 14-test parity suite that compares bit-for-bit on a frozen corpus.
Four contracts, written in Solidity 0.8.24, optimised for Base Sepolia.
Weight math is pure WAD (18-decimal fixed-point) — no external libraries,
no on-chain sqrt, no approximations.
Per-agent Brier-score storage. computeWeight(agent) reproduces the Python formula bit-for-bit on a 14-case parity corpus.
Vote aggregation. Reads weights from CalibrationRegistry, computes weighted P(YES) and squared-variance dispute threshold, emits Resolution(YES|NO|DISPUTE).
Per-question reward pools. 70/30 split between correctness payouts and calibration improvement. Pull-payment pattern, no re-entrancy surface.
Soulbound ERC-721 per agent node. Transfers blocked. Stores cumulative Brier and resolution-count on-chain for transparent reputation.
Two minutes from git clone to a calibration-weighted answer.
No API keys. No paid services. Local LLM optional — demo mode runs
with zero network calls.
git clone https://github.com/solmonger/swarm-oracle.git cd swarm-oracle # Demo mode — no LLM required, deterministic, 3 seconds python swarm_verify.py --demo "Did BTC close above 100K on May 5, 2026?" # Or: full pipeline with a local llama.cpp / Ollama server export LLM_API_URL="http://localhost:8080/v1/chat/completions" python swarm_verify.py "Will ETH close above $3,000 on June 1, 2026?" # Or: one-shot Docker docker compose up # API at http://localhost:8000/docs docker compose run oracle demo # CLI demo, no LLM needed
make test # 742 Python tests make test-solidity # 55 Foundry tests make test-integration # End-to-end pipeline make benchmark # Reproduce the comparison table above (100% accuracy, 0.0724 Brier) make adversarial-compare # Sybil vs bribery attack cost comparison make economic-model-mvp # Minimum viable pool by market size