Sergiu Hart / papers /
"Calibeating": Beating Forecasters at Their Own Game
"Calibeating": Beating Forecasters at Their Own Game
Dean P. Foster and Sergiu Hart
(*) Corrections

Page 1450, in the 7th line after formula (6), replace \( \phi : B \rightarrow
B \) with
\[\phi : B \rightarrow \Delta(A) \]

Page 1472, in the displayed formula in Remark (b), replace
\(\bar{a}^n_{t1}\)
with
\[\bar{a}^n_{i,t1}\]
(the subscript i is
missing)
MultiCalibeating and Stronger
Experts
Since the refinement score is the minimal Brier score over all relabelings of
the bins (see (2) and the last paragraph of Section 2),
 \(\mathbf{c}\) is
multicalibeating \(\mathbf{b}_{1},...,\mathbf{b}_{N}\) if
\[\mathcal{B}_{t}^{\mathbf{c}}\leq\min_{\phi}\mathcal{B}_{t}^{\phi(\mathbf{b}_{1},...,\mathbf{b}_{N})}
+o(1)\] (as \(t\rightarrow\infty\); we ignore the uniformity on
\(\mathbf{a}\)), where the minimum
is taken over all functions
\(\phi:\Pi_{n=1}^{N}B^{n}\rightarrow\Delta(A).\)
By
comparison, when all forecasts are probability distributions on \(A,\) i.e., all
\(B^{n}\) are subsets of \(\Delta(A)\), Foster (1991) defines (see
(#)
below):

\(\mathbf{c}\) is
as strong as \(\mathbf{b}_{1},...,\mathbf{b}_{N}\) if
\[\mathcal{B}%
_{t}^{\mathbf{c}}\leq\min_{1\leq n\leq N}\mathcal{B}_{t}^{\mathbf{b}_{n}%
}+o(1)\]
 \(\mathbf{c}\) is
as strong as the convex hull of
\(\mathbf{b}_{1},...,\mathbf{b}_{N}\) if \[\mathcal{B}%
_{t}^{\mathbf{c}}\leq\min_{w}\mathcal{B}_{t}^{w_{1}\mathbf{b}_{1}%
+...w_{N}\mathbf{b}_{N}}+o(1),\] where the minimum is taken over all
\(w=(w_{1},...,w_{N})\) with \(w_{n}\geq0\) and \(\sum_{n=1}^{N}w_{n}=1.\)
Thus:
calibeating is stronger than being the "stronger
expert."
(#) In the expansive literature on experts, these notions
are referred to as "prediction with no regret"; see Appendix A.9 of the
full paper for the parallel results with the logarithmic
scoring rule.
Abstract
In order to identify expertise,
forecasters should not be tested by their calibration score,
which can always be made arbitrarily small, but rather by their Brier
score.
The Brier score is the sum of the calibration score and the refinement
score; the latter measures how good the sorting into bins with
the same forecast is, and thus attests to "expertise." This raises the
question of whether one can gain calibration without losing expertise,
which we refer to as "calibeating." We provide an easy way to
calibeat any forecast, by a deterministic online procedure. We moreover
show that calibeating can be achieved by a stochastic procedure that
is itself calibrated, and then extend the results to simultaneously
calibeating multiple procedures, and to deterministic procedures that are
continuously calibrated.
Last modified:
© Sergiu Hart