Sergiu Hart / papers / "Calibeating": Beating Forecasters at Their Own Game

"Calibeating": Beating Forecasters at Their Own Game

Dean P. Foster and Sergiu Hart


(Acrobat PDF files)

(*) Corrections

Page 1450, in the 7th line after formula (6), replace \( \phi : B \rightarrow B \) with \[\phi : B \rightarrow \Delta(A) \]
Page 1472, in the displayed formula in Remark (b), replace \(\bar{a}^n_{t-1}\) with \[\bar{a}^n_{i,t-1}\] (the subscript i is missing)

Multi-Calibeating and Stronger Experts

Since the refinement score is the minimal Brier score over all relabelings of the bins (see (2) and the last paragraph of Section 2),

\(\mathbf{c}\) is multi-calibeating \(\mathbf{b}_{1},...,\mathbf{b}_{N}\) if \[\mathcal{B}_{t}^{\mathbf{c}}\leq\min_{\phi}\mathcal{B}_{t}^{\phi(\mathbf{b}_{1},...,\mathbf{b}_{N})} +o(1)\] (as \(t\rightarrow\infty\); we ignore the uniformity on \(\mathbf{a}\)), where the minimum is taken over all functions \(\phi:\Pi_{n=1}^{N}B^{n}\rightarrow\Delta(A).\)

By comparison, when all forecasts are probability distributions on \(A,\) i.e., all \(B^{n}\) are subsets of \(\Delta(A)\), Foster (1991) defines (see (#) below):

\(\mathbf{c}\) is as strong as \(\mathbf{b}_{1},...,\mathbf{b}_{N}\) if \[\mathcal{B}% _{t}^{\mathbf{c}}\leq\min_{1\leq n\leq N}\mathcal{B}_{t}^{\mathbf{b}_{n}% }+o(1)\]
\(\mathbf{c}\) is as strong as the convex hull of \(\mathbf{b}_{1},...,\mathbf{b}_{N}\) if \[\mathcal{B}% _{t}^{\mathbf{c}}\leq\min_{w}\mathcal{B}_{t}^{w_{1}\mathbf{b}_{1}% +...w_{N}\mathbf{b}_{N}}+o(1),\] where the minimum is taken over all \(w=(w_{1},...,w_{N})\) with \(w_{n}\geq0\) and \(\sum_{n=1}^{N}w_{n}=1.\)

Thus: calibeating is stronger than being the "stronger expert."

(#) In the expansive literature on experts, these notions are referred to as "prediction with no regret"; see Appendix A.9 of the full paper for the parallel results with the logarithmic scoring rule.

Abstract

In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise." This raises the question of whether one can gain calibration without losing expertise, which we refer to as "calibeating." We provide an easy way to calibeat any forecast, by a deterministic online procedure. We moreover show that calibeating can be achieved by a stochastic procedure that is itself calibrated, and then extend the results to simultaneously calibeating multiple procedures, and to deterministic procedures that are continuously calibrated.

First version: February 2020
The Hebrew University of Jerusalem, Center for Rationality DP-743, October 2021
Revised, May 2022
Revised, October 2022: arXiv http://arxiv.org/abs/2209.04892v2
Theoretical Economics 18 (2003), 4, 1441-1474
doi.org/10.3982/TE5330

"Calibeating": Beating Forecasters at Their Own Game Dean P. Foster and Sergiu Hart

"Calibeating": Beating Forecasters at Their Own Game

Dean P. Foster and Sergiu Hart