Benchmarks in Leipzig
Published as arXiv preprint, 2026
This preprint presents the Leipzig Benchmarks, a collection of 100 research-level mathematics questions with known answers. The dataset was compiled between April 1 and May 15, 2026, with much of the work taking place during the three-day workshop Benchmarks in Leipzig at the Max Planck Institute for Mathematics in the Sciences in Leipzig, Germany.
The questions are used to evaluate the mathematical reasoning abilities of several state-of-the-art large language models. The evaluation proceeds in multiple stages, from single attempts to repeated runs and tests with heavy-thinking models. The results show that many benchmark questions can now be solved by current systems, while a small number remain unsolved after the full evaluation pipeline.
Citation: A. Balakin et al. (2026). "Benchmarks in Leipzig." arXiv preprint. arXiv:2606.05818.
Download Paper
