Superforecasting - Notes

Sherman Kent formative thinker in early development of analysis, OSS and then CIA key word in his work – estimate, what is the probability of an event occurring? National Intelligence Estimates in US (NIE 29-51) discussing chances of Yugoslavia breaking with USSR use of language problematic because highly open to interpretation what do words like probable, likely mean – they are open to wide interpretation and therefore misunderstanding.

Attempt made to link language to agreed percentages ie certain = 100% but many do like use the subjective use of numbers, but really numbers just like words can be used to express estimates. Numbers can help make phrases like highly probably more explicit and help improve shared interpretations and understanding.

Intelligence advanced research projects activity (IARPA) forecasting R&D trial. IARPA ran forecasting tournaments where the objective was to beat a control group which in effect was the wisdom of the crowd – see below. An experiment to see how forecasting works or indeed if it works. Asked lots of hard to predict questions such as will N Korea conduct a ballistic missile test in the next 3 months. Who will win the Presidential election in Chile this year etc

Contaminating influences on clear thinking. Misinterpreting randomness. We’re hard wired to look for patterns but as a result often trip ourselves up by finding meaning and connection in things which are not connected – the bias is the illusion of control. If you understand the concept of regression to the mean it helps to separate out luck from skill in forecasting. If your lucky you can beat the odds, eg flip a coin and get 10 heads in a row, but regression to the mean in time will pull that back to the reality of a 50:50 act of chance. (Superforcasters identified from this study year on year were broadly immune from regression to the mean suggesting more skill than luck was involved in what they were doing.

Greek warrior poet Archilochus – ‘The fox knows many things but the hedgehog knows one big thing’ metaphor for thinking about how pundits and analysts think. Some have a philosophy or world view – an organizing idea to which all information either conforms or disagrees, eg communism – any religious, political or economic ideology. Like the hedgehog they only know 1 thing, imagine wearing green tinted glasses – everything looks green, not because it is but because of the glasses. Other thinkers are more open to competing ideas, ambiguity and the messy reconciliation or synthesis of alternative points of view. Like foxes they know many things. Needless to say the latter make better forecasters and analysts.

Mathematically good judgement can be thought of as consisting of both calibration and resolution. Combined they can be used to create a Briar score to tell us how good a forecaster is. (p60-66) A perfect briar score is 0 being completely wrong scores 2.0 guessing ie 50:50 statistically will produce a briar score of 0.5.

In 1906 British scientist Sir Francis Galton went to a country fair and watched as hundreds of people guessed the weight of a live ox after it had been “slaughtered and dressed’ – their collective guess of 1,197lbs was 1lb short of the correct answer. Early demo of what James Surowiecki’s bestseller The Wisdom of Crowds Many guesses aggregated tend to be quite accurate, in a large group of people there will be lots of useful informing information but it tends to be dispersed. Added together useful information pulls the answer in the correct direction. In the whole group there will also be lots of misleading and wrong information but because this tends to pull in lots of different directions rather than one these wrong guesses effectively get cancelled out. Works effectively if those contributing know a bit about the issue, aggregating lots of nothing gets you nothing but aggregating many small insights is useful. (Heart of systems used by statisticians like Nate Silver who use polls of polls and add in other sources of info to improve the aggregate.

Nuanced approach to prediction has a lot of if but and maybes many alternative outcomes are possible – this does not make for good TV or punditry which likes certainty and confidence. So lots of public predictions are crap as they are selected on the wrong criteria – your looking at an entertainer not someone with the mindset of a great forecaster.

P110 Enrico Fermi Italian American physicist who had a big role in the Manhatten project. Fermi problems working out estimates of seemingly improbable questions using sensible estimated deductions – eg how many piano tuners are there in Chicago?

Outside view. To establish a baseline answer to a question look at comparative or comparable events. In answer to will the US President be the victim of an assassination attempt this year you could look at how many attempts there have been over the last 50 years how many were successful causes impacts etc and this can help you get a baseline eg in an average 5 year all things being equal there will 2 attempts. This gives you a baseline from which to refine your analysis.

Dragonfly view. Dragonflies have incredibly complex compound eyes which in effect give it vision all around itself, works by the synthesis of many separate contributing points of view.

80% chance of something happening equally means 20% chance it wont.

P153 summary of basic superforecasters approach. Break the problem down, distinguish between know and unknown. Check all assumptions. Take the outside view first and remove the uniqueness of the specific problem, ie what could it be compared to. Take the inside view and examine what is unique about this problem. Take comparative points of view from others in counterpoint to your own analysis. Extract any value from the prediction markets / wisdom of crowds, synthesise all of the above in order to apply own judgement as precisely as possible – specificity.

To learn from failure we must know when we fail. The feedback loop is critical. Police officers especially when experienced are notoriously overconfident in their judgemet for example the ability to detect when a suspect is lying. They don’t get accurate feedback – post case etc so there is no feedback loop. Tend therefore to be overconfident and poorly calibrated.

Bay of pigs fiasco and Cuban missile crisis the nadir and zenith of Kennedy administration foreign policy decision making. The same team of advisors was in place for both events. Learning from the latter led to the capability to deliver a safe outcome in the former. P195

The review ordered by Kennedy after the bay of pigs identified cosy groupthink as one major failure or the decision-making process. Special Council Sorenson and Bobby Kenndy were appointed intellectual watchdogs, broke down specialist silos and had everyone apply their best judgement as generalist to problems, everything got questioned and scepticism became the watchword.

Developed forecasting teams – end of year 1 these were 23% more accurate than individuals. Encouraged creative tension, effective questioning and feedback processes.

Superteams need both ability and diversity. The one without the other is suboptimal. Aggregation of forecasts is very useful / powerful but only if it includes genuine diversity of views and options.

Intellectual humility a key characteristic – recognize that the more you know the more aware you become of how little you really know. We can be and often are wrong – makes us more open to alternative interpretations and open to wider influences and allows us to treat ideas in an emotionally detached and intellectually agnostic manner. Dispassionate.

Agenda of many (public) forecasts is not accuracy or reliability there are goals and objectives – an agenda for particular pundits and institutions.

Appendix p277 10 commandments for forecasting

Triage – where to focus effort for maximum gain, cant read and process everything

Break large questions into small questions, deal with constituent parts

Strike the right balance between outside and inside views, nothing is 100% unique in all of its particulars, find an external point of reference

Balance between over and under reacting to evidence – best forecasters tend to make multiple minor adjustments – eg 75 moves to 78% to reflect new evidence.

Look for clashing causal factors. No paint by numbers solution, hypothesis, antithesis, synthesis too straight forward – dragonfly eye incorporates multiple points of view

Strive to identify as many degrees of doubt as possible but no more

Balance under and over confidence, prudence and decision

Review your mistakes and look at the causal roots – own your failures

Bring out the best in others and let them do the same in you. Learn to disagree without being disagreeable. Tommy Lasorda, former coach of LA Dodgers ‘Managing is like holding a dove in your hand. If you hold it too tightly you kill it, but if you hold it too loosely you loose it’

Master the error balancing cycle. Use and incorporate feedback

(Don’t treat commandments as commandements!)