Swarm Oracle

How it works

One question, three parallel agents, one calibrated answer. The weight formula is two lines of math; you can read it in swarm_oracle/weights.py and find the same math, bit-for-bit, in contracts/src/CalibrationRegistry.sol.

$ python swarm_verify.py "Did BTC close above 100K on May 5, 2026?"

========================================================================
  SWARM ORACLE  |  Calibration-Weighted Consensus
========================================================================
Question : Did BTC close above 100K on May 5, 2026?
Agents   : 3
Elapsed  : 3.30s

Individual votes:
  agent-oracle     | strategy=api         | P(YES)=0.030 | conf=0.90 | weight= 10.00 ( 60.1%) ████████████········
  agent-reliable   | strategy=web_search  | P(YES)=0.050 | conf=0.80 | weight=  5.56 ( 33.5%) ███████·············
  agent-novice     | strategy=knowledge   | P(YES)=0.500 | conf=0.00 | weight=  1.07 (  6.4%) █···················

Consensus:
  Weighted P(YES) = 0.0303
  Variance        = 0.0143
  Decision        = NO
========================================================================

1. Question in

Any binary (YES/NO) prediction question. CLI, FastAPI endpoint, or direct Python.

2. Parallel research

Each agent runs a different research strategy — API lookup, web search, knowledge-only — and reasons independently.

3. Calibration weighting

weight = 1 / (brier + ε) scaled by a confidence ramp for new agents. Well-calibrated agents get more vote.

4. Weighted consensus

Linear opinion pool produces a single P(YES). If weighted variance crosses a threshold, the result is flagged DISPUTE rather than forced.

5. On-chain verification

CalibrationRegistry.sol and SwarmConsensus.sol mirror the math in WAD (18-decimal) fixed-point. Anyone can recompute weights from public Brier scores.

6. Self-improving

Every resolution becomes training data. Brier scores update, weights re-derive, future predictions sharpen. The protocol gets smarter without a re-deploy.

The benchmark

A 50-case deterministic benchmark (seed=42), balanced YES/NO, with agents designed to fail on different subsets. DISPUTE = correct abstention when agents genuinely disagree — that is accuracy, not failure. Reproduce with make benchmark.

Method	Accuracy	Brier ↓	Disputes	Notes
swarm	100%	0.0724	18/50	Best Brier of all methods
majority vote	92.0%	0.0785	0
average	98.0%	0.0935	0
agent-oracle	84.0%	0.1029	0	Best single agent
agent-reliable	80.0%	0.1332	0
agent-novice	68.0%	0.2009	0

The swarm protocol beats every single agent on Brier score — including the oracle's 0.1029. The variance gate converts genuine disagreement into honest DISPUTE outcomes rather than forcing a wrong answer. Reproduce locally with make benchmark.

The on-chain layer

Four contracts, written in Solidity 0.8.24, optimised for Base Sepolia. Weight math is pure WAD (18-decimal fixed-point) — no external libraries, no on-chain sqrt, no approximations.

CalibrationRegistry.sol

Per-agent Brier-score storage. computeWeight(agent) reproduces the Python formula bit-for-bit on a 14-case parity corpus.

SwarmConsensus.sol

Vote aggregation. Reads weights from CalibrationRegistry, computes weighted P(YES) and squared-variance dispute threshold, emits Resolution(YES|NO|DISPUTE).

RewardDistribution.sol

Per-question reward pools. 70/30 split between correctness payouts and calibration improvement. Pull-payment pattern, no re-entrancy surface.

AgentIdentity.sol

Soulbound ERC-721 per agent node. Transfers blocked. Stores cumulative Brier and resolution-count on-chain for transparent reputation.

Try it locally

Two minutes from git clone to a calibration-weighted answer. No API keys. No paid services. Local LLM optional — demo mode runs with zero network calls.

Quickstart

git clone https://github.com/solmonger/swarm-oracle.git
cd swarm-oracle

# Demo mode — no LLM required, deterministic, 3 seconds
python swarm_verify.py --demo "Did BTC close above 100K on May 5, 2026?"

# Or: full pipeline with a local llama.cpp / Ollama server
export LLM_API_URL="http://localhost:8080/v1/chat/completions"
python swarm_verify.py "Will ETH close above $3,000 on June 1, 2026?"

# Or: one-shot Docker
docker compose up                         # API at http://localhost:8000/docs
docker compose run oracle demo            # CLI demo, no LLM needed

Verify the math

Test & verify

make test               # 742 Python tests
make test-solidity      # 55 Foundry tests
make test-integration   # End-to-end pipeline
make benchmark          # Reproduce the comparison table above (100% accuracy, 0.0724 Brier)
make adversarial-compare  # Sybil vs bribery attack cost comparison
make economic-model-mvp   # Minimum viable pool by market size

Calibration-weighted truth.