2.3.4 Computing calibration scores
Course subject(s)
Module 2. Calibration and Information score
The calibration score
The calibration score measures how different an expert’s empirical probability vector s is from theoretical probability vector p is. The calibration score is computed using this formula
Cal(e)=1-F(2*m*I(s,p)),
where m is the number of calibration questions and F is the cumulative distribution function (cdf) of a chi-squared distribution with 3 degrees of freedom. Also, recall the Kullback-Leibler divergence of s and p, or the relative information of s with respect to p
I(s,p)=s_1*ln(s_1/p_1)+s_2*ln(s_2/p_2)+s_3*ln(s_3/p_3)+s_4*ln(s_4/p_4)
Chi-squared distribution
The chi-squared distribution is a parametric distribution used commonly in hypothesis testing.
You can compute values for the cumulative distribution function of a chi-squared distribution using an online calculator.This calculator will round some results to the second decimal, but we will further use results rounded at the third decimal.
For computing calibration scores, you need a chi-squared distribution with 3 degrees of freedom. Also, the “chi-squared critical value” represents the value at which you need to evaluate the cumulative distribution function.
Here is an example of how to compute the calibration score.
Imagine an expert answers 10 calibration questions, and the resulting empirical probability vector is
s=(1/10,2/10,5/10,2/10).
The Kullback-Leibler divergence is then 0.2370678.
Then 2*m*I(s,p)=2*10*0.2370678=4.741356. This is the critical value (CV) in the formula used by the calculator mentioned above.
We need to evaluate the cumulative distribution function of a chi-squared random variable, with 3 degrees of freedom, at the point 4.741356. That is, compute F(2*m*I(s,p))=P(X<=2*m*I(s,p))=P(X<=4.741356).
The calculator gives us 0.81.
So,
Cal(e)=1-F(2*m*I(s,p))=1-0.81=0.19.
Let’s go back again to the Dutch eating habits example and compute the calibration scores for the three experts.
Consider the following table where 3 experts have given their 5%, 50% and 95% quantiles for 5 different questions.
A Dutch supermarket is interested in eating habits among Dutch adults.
For this purpose, three experts have been consulted. First, these experts need to be evaluated based on five calibration questions.
1) What percentage of Dutch adults eats fruit on a daily basis?
2) What percentage of Dutch adults eats fast food less than once a month?
3) Consider the caloric consumption of Dutch adults ten years ago. What is the caloric consumption today, compared to ten years ago? (here, 100% means there was no change)
4) How many liters of milk are consumed on a yearly basis by the average Dutch adult?
5) How many kilos of meat does the average adult consume in six months time?
The answers of the experts are summarized in the table below. For example, expert 1 estimates this percentage to be 46. Also, he believes that there is 90% chance that the percentage is between 44 and 49. The realization, based on actual research, turned out to be 50 (Note that the data are purely fictional).
Question | Realization | Expert 1
5% 50% 95% |
Expert 2
5% 50% 95% |
Expert 3
5% 50% 95% |
---|---|---|---|---|
1 | 50 | 44 46 49 | 30 40 55 | 38 47 55 |
2 | 7 | 9 12 15 | 1 15 20 | 2 8 17 |
3 | 108 | 102 106 110 | 60 80 95 | 91 99 106 |
4 | 66 | 55 59 64 | 53 70 80 | 58 68 75 |
5 | 24 | 28 31 35 | 10 19 30 | 26 35 43 |
Decision Making Under Uncertainty: Introduction to Structured Expert Judgment by TU Delft OpenCourseWare is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://online-learning.tudelft.nl/courses/decision-making-under-uncertainty-introduction-to-structured-expert-judgment//.