Calibration: improving our model such that predicted probability distribution is similar to the probability observed in training data.

Calibration plot (=q-q plot)

  • 2 classes {1,1}\{-1, 1\}
  • sort by predicted probability p^i=P^(y=1Xi)\hat p_i = \hat P(y=1 \vert X_i)
  • Define bins BiB_i between 00 and 11 and compute pi=k,p^kBiIyk=1Bip_i = \frac{\sum_{k, \hat p_k \in B_i} \mathbb{I}_{y_k =1}}{\lvert B_i \rvert}
  • plot p^i\hat p_i against p^i\hat p_i.

Perfect calibration plot should be identity:


Sigmoid / Platt calibration

Logistic regression on our model output:

P^new(yX)=11+exp[αP^(yX)+β]\hat P_\text{new}(y \vert X) = \frac{1}{1 + \exp-[\alpha \hat P(y\vert X) + \beta]}

Optimized over α\alpha and β\beta.

Isotonic regression

Let (p^1,p1),,(p^n,pn)(\hat p_1, p_1), \dots, (\hat p_n, p_n). Isotonic regressions seeks weighted least-squares fit pi^newpi\hat{p_i}^\text{new} \approx p_i s.t. pi^newpj^new\hat{p_i}^\text{new} \leq \hat{p_j}^\text{new} whenever pi^p^j\hat{p_i} \leq \hat p_j.

Objective is: mini=1nwi(p^inewpi)2 s.t. p^1newp^jnew\min \sum_{i=1}^n w_i (\hat p_i^\text{new} - p_i)^2 \text{ s.t. } \hat p_1^\text{new} \leq \dots \leq \hat p_j^\text{new} assuming the pip_i's are ordered.

This yields a piecewise constant non-decreasing function. To solve this we use the pool adjacent violators algorithm. See these notes.