Swarm Oracle — Benchmark Report

Generated 2026-05-13 06:13:28 · 12 cases · methods: 6

Method comparison

MethodAccuracyBrier ↓Log loss ↓CorrectDisputed
swarm91.7%0.08590.295011/127/12
agent-oracle83.3%0.08700.280310/120/12
agent-reliable83.3%0.11120.344710/120/12
average91.7%0.11840.385411/120/12
majority83.3%0.16673.453910/120/12
agent-novice66.7%0.23400.66128/120/12
↓ lower is better. Swarm = calibration-weighted consensus; majority = each agent's P(YES) thresholded at 0.5 then majority vote; average = unweighted mean. The protocol flags DISPUTE rather than committing when weighted variance is high — those count as misses.

Brier score by method (lower is better)

swarm0.0859agent-oracle0.0870agent-reliable0.1112average0.1184majority0.1667agent-novice0.2340

Accuracy by method

swarm91.7%average91.7%agent-oracle83.3%agent-reliable83.3%majority83.3%agent-novice66.7%

Per-question breakdown

QuestionCategoryTruthswarmagent-oracleagent-reliableaveragemajorityagent-novice
Did BTC close above $100K on May 5, 2026?cryptoNO 0.065 NO 0.030 NO 0.050 NO 0.193 NO 0.000 NO 0.500 YES
Will ETH be above $5,000 on June 1, 2026?cryptoNO 0.345 DISPUTE 0.200 NO 0.550 YES 0.467 NO 1.000 YES 0.650 YES
Did BTC reach an all-time high in 2024?cryptoYES 0.946 YES 0.970 YES 0.930 YES 0.900 YES 1.000 YES 0.800 YES
Will the home team win the Saturday derby?sportsYES 0.686 DISPUTE 0.720 YES 0.650 YES 0.640 YES 1.000 YES 0.550 YES
Will the #1 seed win their first-round playoff game?sportsNO 0.519 DISPUTE 0.350 NO 0.780 YES 0.627 YES 1.000 YES 0.750 YES
Will France finish in the top 4 of EURO 2024?sportsYES 0.793 DISPUTE 0.780 YES 0.850 YES 0.743 YES 1.000 YES 0.600 YES
Will global average temperature anomaly drop below pre-industrial baseline by 2030?generalNO 0.053 NO 0.020 NO 0.050 NO 0.157 NO 0.000 NO 0.400 NO
Was Paris the host of the 2024 Summer Olympics?generalYES 0.968 YES 0.990 YES 0.950 YES 0.930 YES 1.000 YES 0.850 YES
Will a major LLM lab release a model exceeding GPT-4o on MMLU by end of 2026?generalYES 0.692 DISPUTE 0.820 YES 0.550 YES 0.523 YES 1.000 YES 0.200 NO
Will the world adopt a single global currency by 2027?generalNO 0.025 NO 0.010 NO 0.030 NO 0.063 NO 0.000 NO 0.150 NO
Will the underdog cover the +7 spread in tonight's game?sportsYES 0.537 DISPUTE 0.400 NO 0.780 YES 0.577 YES 1.000 YES 0.550 YES
Did SOL close above $300 yesterday?cryptoNO 0.423 DISPUTE 0.600 YES 0.120 NO 0.357 NO 0.000 NO 0.350 NO