On the experiment to COLLAPSE COGNITIVE BIAS (in support of decision making)

Guy Loftus (K2V Ltd.) and Marc Bond (Rose & Associates), 26th March, 2019

For the last 2 years, K2V Ltd has been experimenting with crowd-sourced opinion to establish if the “wisdom of the crowd” can help to support investment decisions for the oil & gas sector of the 21st century. The experiment employs a technique called “Knowledge Stacking” which was launched in 2017 [1] using a simulated data room of a real prospect made available online and presented in plenary sessions around UK universities. The premise was simple: does crowd-sourced opinion of an opportunity converge on the actual value of that opportunity? The hope was that crowd-sourced independent opinion would collapse cognitive bias when freed from the distortion created by shared opinion (or groupthink [2]). What has emerged is a gap between participants’ automatic responses and their evidence-based opinions. This article lists some preliminary results without giving away too much to avoid distorting ongoing participation.

BACKGROUND

Estimating the number of almonds in a bottle (a singularity) is a task that is hard for individuals to get right [3] but is very easy and normally correct if you average crowd-sourced opinion using some basic ground rules [2]. The reason it is hard for individuals is because a single deterministic estimate is only part of a much broader forecasted reality, which the crowd are very adept at scoping. The singularity lies within an uncertainty range that can only be properly scoped when a large enough sample of independent guesses have been made that are equally informed (or ill-informed). In the absence of any hard evidence, we are free to make a quick guess using our gut-feeling unencumbered by the need to over-engineer the answer; with nothing to lose, we do it for fun in the hope that we have defied the odds by being correct. Reflexive responses using gut-feeling make quick work of single estimates but switch to slower, more demanding (reflective) ways of thinking when presented with complex uncertainty [4]. We may use a rule of thumb or heuristic to simplify and guide our thinking (“the smaller the grain size the more grains can fit in to the same space”), which is also reflexive because all we have to do is guess a larger number of chick peas than almonds. Having a relative guide to grain count does little to change uncertainty if uncalibrated.

There are just three attributes (simplified here) that contribute to uncertainty when counting almonds in a bottle: the internal dimensions of the bottle, the average almond size and the pore space. This creates a multi-variate distribution of uncertainty, which, even in this simplified form, is difficult to estimate without hard data:

Number of Almonds = total bottle volume / the volume occupied by an average sized almond + pore volume

When tasked with guessing the number of almonds in the bottle, our reflexive thinking instantly collapses all the attributes in to a single question to deliver an estimate, which is inherently inaccurate [4]. But what if we were to de-construct the estimate based on its component attributes, ask separate questions for each attribute (1- what are the internal dimension of the bottle? 2- how big is the average almond? 3- what porosity can we expect?), “stack” the deconstructed attributes using the formula above to calculate the number of almonds and then compare that with the original guess? Would stacking make a difference to the outcome? It is tempting to suggest that you might be comparing a reflexive response to a complex problem (a quick, possibly unconscious response rooted in instinct, experience, mood and intuition) with a reflective response (a slower, more controlled response rooted in rational, logical, critical and deductive thinking). But that may not be what is happening here.

METHOD

A mock data room was prepared in 2017 giving participants all the information they need to form an opinion on a simulated investment decision. Those conducting the simulation online are given investment proposals and those in plenary sessions are invited to role play as exploration managers. In common with all data rooms, some of the displays are "sub optimal", and the information supplied reveal both gaps and an oversupply of detail in some areas. Like all data rooms, respondents have to make rapid judgments using their capacity to analyse the data, build linkages, connect implications and distill the key issues in support of decision making.

A questionnaire is provided to aggregate the evidence containing fifteen de-constructed scorable metrics evaluated on a scale of 1 to 5 where 1 is the lowest and 3 is neutral. A guide description is also provided to assist in selection for consistency.Having scored each item of evidence, respondents are asked to indicate whether scoring the opportunity left them feeling critical, cautious, neutral, opportunistic, bullish or reckless about the opportunity. Once the total scores have been submitted, they are passed to a database which aggregates the scores using a calibrated index (stacking). Plenary participants can see their bubbles appearing on the "optimism" plot live (below - their identities concealed), which forms the basis of a discussion around risk-taking.Despite the complexity of informational challenges presented to participants in live sessions, completing the questionnaire takes them no more than 10-15 minutes.

INITIAL RESULTS

The standard (to some of you) chart above shows the total number of participants to date plotted on a geo-commercial versus geo-technical scale, each bubble representing stacked individual scores based on presented evidence. The bubble colours denote current roles, with the colour of the “halos” describing the gut-feeling for the prospect. The total population is distributed about a mean (white bubble) in what is likely to be a non-uniform way but with no obvious clustering. Note that the choices of reflexive gut-feeling are the same for the evidential class limits of optimism. Despite that, the definition of the “halos” frequently do not match the position of the bubble on the same scale of optimism e.g. some claimed that they felt “bullish” when doing the survey but their evidential scores demonstrated that their analysis was actually “neutral”. That disparity represents a gap between deconstructed thinking (stacked from 15 evidential metrics) and reflexive thinking (gut-feeling). Those whose reflexive estimates fall in to the same class as their stacked estimates are called here realists. Those whose stacked estimates are more optimistic or less optimistic than their reflexive estimates are referred to as optimists and pessimists respectively*.

As results are still coming in, we are unable to reveal too much without distorting the outcome but some preliminary observations can be shared:

The gut-feeling of two thirds of respondents disagrees with their stacked evidence-based opinions
- The percentage of realists, optimists and pessimists are roughly evenly split at around 33% each. The percentage split changes with the level of experience of respondents
Approximately half of the participants plot in the neutral "safe" zone
- The other half are more or less evenly split between "opportunistic" and "cautious", with a slight emphasis on cautious
- The results probably conform to non-uniform distributions
There is no discernible difference between early responders in plenary sessions (who took less than 5 minutes to complete their scores) to those who took 15 minutes or more
Different disciplines and different demographics exhibit different biases but on the whole, the range of evidential outcomes are strikingly consistent
- It is estimated that a minimum of fifty responses for each peer group is required to converge on a stable aggregate outcome
Despite having 7,353 opinions offered by 387 geoscientists from around the world, which is sufficient in number to test whether crowd-sourced opinion converges on the truth, the experiment continues to expand the diversity of opinion to examine the role of demographics on those outcomes

DISCUSSION

The experiment on knowledge stacking has evolved somewhat over the last two years but the data collection method and responses have remained consistent. Potential errors are equally around inconsistencies in pitch as much as errors in thinking, such as bias. Reflexive thinking is nothing more or less than recognition. The belief is that experts have a better chance of being able to recognise value because they are equipped with trained heuristics to aid judgment, which the rest of us probably lack. The problem with reflexive answers to complex (non-numeric) questions is that our reflexive thinking always goes for the easy option, which has the potential to involve heuristic substitution that may or may not conform to the original question [4]. Increasing the granularity of the question to generate less complex component questions which are easier to answer, limits the tendency to make heuristic substitutions. By stacking them, you ensure that the component parts that do contribute to expert opinion are consistently tied to the same expression of value and not substituted by the individual, which inevitably underpins their gut-feeling. That is an error removed. Whereas true experts are guided by all the deconstructed attributes, those with less expertise will be familiar with “some” of the attributes. Stacking itself (in effect, applying a value formula based on a combination of recoverable volume and estimated monetary value) is performed by the computer in a consistent way removing the need to know how value is gauged, which is, different for every individual enterprise. But does that make stacking reflective? Probably not, because if the the deconstructed questions are independent, they are responded to reflexively. All the logical arguments that lead to deductive outcomes are performed by the computer, which (without recursive learning) is also not reflective.

It is clear, however, that more responses are required to draw firm conclusion about the effects of demographics (industry sector, discipline, spread, breadth, depth and years of experience…) on decision making. We have known for over 100 years that the crowd has access to the truth. What is not yet known is the effect different levels of experience have on the degree of analysis applied to each attribute and how close those analyses match their reflexive gut-feeling. The experiment changes the granularity of analyses but not the mode of thinking. Once we have sufficient participation to separate different demographic peer groups, we can establish if experience (superior knowledge) speeds up the convergence of the crowd to the final solution, or if the stacking method itself allows a smaller group to achieve the same outcome with fewer contributions. Both may be important to know for enterprises seeking to aggregate collective wisdom. What could come out of this work is a tool and a method to access that wisdom and to potentially improve investment decision of the future.

CONCLUSIONS

The oil & gas sector relies heavily on expert knowledge for new business development. But as geoscientists, we are only too aware that the right answer to complex problems is not simply a case of "ask the experts", because that assumes that the experts agree on the answers. To consult the “wisdom of the crowd” effectively, we need to preserve all views (however extreme), ensure that they are independent (no one influences another’s opinion), that they are decentralized (we don’t all have the same experience) and that we have the means to aggregate. If you ask a large enough group of diverse, independent people to make a prediction or estimate a probability, and then average those estimates, the errors each of them makes in coming up with an answer will cancel themselves out [2]. The method of aggregation previewed here demonstrates that crowd-sourced opinion delivers results that are both rational and measurable. The question under-pinning this investigation is what role does expertise play (if any) in making the right decision for any enterprise.

REFERENCES:

[1] Knowledge Stacking [www.k2vltd.com/article-6]

[2] Surowiecki, J, 2005, The Wisdom of the Crowds [Anchor Books ISBN 978-0-385-72170-7]

[3] Sweepstake Knowledge: the tyranny of orphaned deterministic numbers [www.k2vltd.com/article-8]

[4] Kahneman, D, 2011 – Thinking Fast and Slow. [Farra, Strauss and Giroux, ISBN 978-0374275631]

*The terms realist, optimists and pessimists have implications for attitudes that are potentially misleading. These will be redefined in the future when we understand what they really mean.