forecasterarena
Forecaster Arena – Testing LLMs on real events with prediction markets. Hey HN! I'm Mert.<p>I built this because I was frustrated with LLM benchmarks potentially being contaminated by training data. When a model scores 99.9% on MMLU-Pro-Max, we can't tell if that's genuine reasoning or memorization.<p>Forecaster Arena tries to solve this by testing models...
Loading reviews...
OpenAI
Anthropic