Frequently Asked Questions

by

The leaderboard ranks forecasters based on their “Brier score.” What is a Brier score?
A Brier score is a measure of forecasting skill. It can be applied to a single forecast or to a set of forecasts from either a single forecaster or many forecasters. When multiple forecasts are scored, researchers usually report the average of the Brier scores as an overall measure of forecast quality. There are in fact two types of Brier scores—one ranges from 0 to 1 and the other ranges from 0 to 2. We use the 0-1 method and only note this because if you compare Brier scores here to studies that use the 0-2 method (e.g., Tetlock and Gardner’s “Superforecasting” work) you might be misled. Brier scores are “deviation measures”—the greater the deviation between a forecast and the actual outcome the larger the score. Therefore a score of zero is best. It means the forecast was correct and issued with complete certainty (or confidence). A score of 1 is sometimes described as what you’d expect from a perverse clairvoyant—the forecasts are always wrong but the forecaster is nevertheless perfectly certain. A Brier score of .25 is what you’d get if you simply said there was fifty-fifty chance of the event occurring (or not occurring) every time you made a forecast.

Why are Brier scores used to measure forecast skill?
Brier scores have two primary features,: they quantify accuracy (see What is a Brier score for details) and they reward ‘truthful’ (as opposed to strategic) forecasters with better scores. Consider a weather forecaster, who thinks there is a slight chance (say 30%) of rain. The forecaster could strategically say close to 0% to try to maximize their score or to achieve some other objective, but he or she would get penalized whenever it rains (both by a higher Brier score and by an angry audience who had their plans rained out- see this clip from Curb Your Enthusiasm). Providing the most truthful forecast will both maximize the score when the forecast is correct and minimize the penalties when the forecast is incorrect (when averaged over several events).

Does a low Brier score mean that you are a great forecaster?
No, not necessarily. All other things being equal, lower Brier scores are indicative of better forecasting skill. However, Brier scores are also influenced by the amount of variance in the outcomes being forecasted, a variable that is often regarded as a measure of the difficulty of the forecasting environment. Naturally, the difficulty of the forecasting questions is not an attribute of the forecaster, which is why one shouldn’t jump to conclusions about what their Brier scores “imply”. However, when everyone is forecasting the same events, the difficulty aspect is constant across forecasters. Therefore, under such conditions, the variability among different forecasters’ Brier scores is an indicator of skill.

Are the people on this leaderboard “super-forecasters”?
According to Penn professor Phil Tetlock, who coined the term, “super-forecasters” are individuals who consistently perform better than their peers over a large number of events. Our data would not allow us to say whether our top forecasters would qualify as super-forecasters because we only asked each forecaster to make six forecasts. It would be premature to claim super-forecaster status on the basis of so few forecasts. For instance, in our study, pessimistic forecasts paid off since none of the results we asked about were reproducible. Had they all reproduced, the best forecasters would have become the worst!

If you are good at forecasting experimental results, will you be good at forecasting other things – like Academy Award winners, sports, weather, or political events?
That’s hard to say. According to many experts, there are some basic cognitive traits that make people good forecasters. These include open mindedness, a growth mindset (a belief that one can become better at something by practicing), and a tendency to parse difficult problems into smaller ones. In principle, these traits can carry from one field of forecast to another. Though there is also evidence that forecasters do better in certain domains. In other words, some people are simple good forecasters, and others know a lot about certain topics.

Why study forecasting skill in science?
Scientific experiments are performed in order to change beliefs about underlying theory or claims. Often, judgments about scientific claims (often called scientific inference) happen in the heads of scientists. Or, they are articulated in vague ways. Forecasting allows us to capture judgments in a very precise way and to quantify the strength of conviction around those judgments. That means by performing forecasts studies, we are peering at the very processes through which scientist come to assert and defend scientific claims. We believe understanding this process is important for many reasons. These include helping scientists to form and adjust their judgments more effectively, and also figuring out which scientists (or fields) are more effective at separating out truth from fiction. Ultimately, we contend that improving the judgment of scientists will lead to a more efficient scientific process.

I heard that several of the first 6 studies in the Reproducibility Project: Cancer Biology were deemed to have reproduced. How come you say none of the studies reproduced in your study?
Our study looked only at the mouse experiments embedded in those 6 studies, and within that, we looked only at a single comparison in each of these experiments. Further, we used two pre-specified criteria (statistical significance and effect size) that are consistent with criteria used in other replication projects. By our criteria and based on the mouse studies alone, none of the first six experiments reproduced effects observed in the original study. The claims that several of these studies partially reproduced results refer to a subset of experiments and only one criterion within each study.

Why didn’t Cancer Biologists perform better with forecasting?
Its hard to know, and our paper in PLoS Biology looks at several possible explanations. One possible reason is that scientists really overestimate the veracity of original study claims, while another is that they underestimate the complexities of repeating original studies. A third possibility is that our participants were new to forecasting (and might not have invested enough time or effort on these forecasts), so they might do substantially better with practice. Indeed, one of the advantages of forecasting and getting feedback is that you can learn much more about where you went wrong and, perhaps even more importantly, how you went wrong (e.g., by being persistently overconfident or underconfident). One thing is for sure: more research along these lines is needed!

Your study suggests that cancer experts were no better at predicting replications than guessing 50%. Why should we trust anything scientists say?
The simple reason is: science is self-correcting. That means that scientists will often be temporarily wrong before they discover the truth. Scientific judgment is constructed by building bodies of evidence, and one study is one piece of evidence. Predicting individual scientific results speaks to short-term judgment, not long-term inference. Good scientists learn when they are wrong, and update their beliefs. Sometimes it can take decades for this process to unfold. Science – for all its flaws – is the best way of discovering the truth and generating predictions. Our study is not designed to debunk science. While we think there are problems with the way some science is done, our work is aimed at refining the way we do science.

This stuff is really cool. How can I learn more about forecasting?
If you have any questions, write us if you have any questions at forecasting@translationalethics.com. There are also two excellent books – written for laypersons – that deal extensively with forecast skill. One is by Nate Silver (The Signal and the Noise: Why So Many Predictions Fail, but Some Don’t). Another is by Phil Tetlock and Daniel Gardner (Superforecasting: The Art and Science of Prediction). For a more academic account of experts and forecasting, see Tetlock’s book (Expert Political Judgment: How Good Is It? How Can We Know? Note: This book focuses on forecasting current events and political intelligence).

BibTeX

@Manual{stream2017-1410,
    title = {Frequently Asked Questions},
    journal = {STREAM research},
    author = {STREAM admin},
    address = {Montreal, Canada},
    date = 2017,
    month = jul,
    day = 5,
    url = {http://www.translationalethics.com/faq/}
}

MLA

STREAM admin. "Frequently Asked Questions" Web blog post. STREAM research. 05 Jul 2017. Web. 11 Dec 2018. <http://www.translationalethics.com/faq/>

APA

STREAM admin. (2017, Jul 05). Frequently Asked Questions [Web log post]. Retrieved from http://www.translationalethics.com/faq/


Comments are closed.

Search STREAM

Old blog posts


All content © STREAM research

admin@translationalethics.com
Twitter: @stream_research
3647 rue Peel
Montreal QC H3A 1X1