Knowledge by Association - reclaiming amorphous data

May 2017


Following decades of visiting and revisiting bread & butter basins as well as some frontier basins, the industry has accumulated vast amounts of data both worked and re-worked.  As a result, many of the larger organisations find themselves sitting on amorphous data mountains, unable to apply decisions to a complete dataset, which they own but have somehow lost connection with.  And yet these same organisations have often cultivated strong data management communities who tirelessly manage their data to a level of detail that astounds those of us who are not data managers and usually makes us run for cover.  We are characteristically very happy for others to do "data management" for us but the truth is that the key to success in data management can be summed up in one word: ownership.  However elaborate the efforts of data managers to clean up our messes for us, as explorationists we need to play our part for data management to work by taking ownership.   The best data managers have sufficient geoscience background to recognise the intrinsic relatedness of data but they remain highly competent custodians of those data only.  Enlightened management in some organisations confer the responsibility of ownership on to explorationists (on behalf of the asset), as we morph the data through the business life-cycle from raw data to business decision.  So even in organisations with strong data management frameworks, what is still going wrong?


The model illustrated above presents the data life-cycle (right), showing how data are transformed from their raw origins in to derivations that have more value to the business, as the business adds intellectual property (IP) to those data.  The same activities also add knowledge (left) as a consequence of the data transformation.   K2V Limited believes that the activities themselves are to a large extent generic in the oil & gas sector, allowing us to group each data item in to discrete activity related groups.   211 individual data items commonly in use in the oil & gas sector have been mapped across those activities.  Applying the knowledge & data life-cycle in this way not only shows how much value is added in the life-cycle (adding IP to the data with a clear VoI), it also allows us to recognise the "relatedness" of the data and provides a guide to how those data should be published, stored, archived or deleted.  The model facilitates more structured discussions between the geoscience community and data managers, helping to identify uniqueness in data and not waste resources on repeated or redundant items.  The model also implies that the activities themselves (Acquire, Process, Interpret, Integrate and Evaluate) provide stages in knowledge growth.  Data and knowledge remain independent and should be managed differently but they are linked through activity, which can form the basis for metadata creation.

To illustrate the principle, let us take seismic data and chart a simplified journey from raw data source to completely dissociated decision making.  Seismic are acquired and delivered to the client as raw SEG-D data (which may include offsets, gathers etc.).  SEG-D data have no intrinsic value to geoscientists (we can't visualise them in that state), so the first order derivation is to process the data (clean it up and apply an earth model) to generate usable SEG-Y data.   Geoscientists then use visual imaging software and techniques to interpret the SEG-Y data to map horizons (second order derivation), blending data from other raw classes of data (e.g. well logs below) to calibrate the horizons or to make rock property models to explain amplitude responses (Quantitative Interpretation).  Third order derivation involves a step change, comprehensively integrating data from multiple sources to generate grids, which arguably no longer need the original seismic data to sustain their own existence.  K2V Ltd calls the second order to third order transformation step (i.e. from depending on association with the raw data to becoming independent from it) the "Data Unconformity".   This data hiatus is an important concept with perceptual, data management and legal implications which will be expanded elsewhere.  For this example, the grids only retain an association with their original seismic origin through data management - grids can, in effect (like numbers) take on a life of their own if not properly managed.   The fourth order derivation concludes the dissociation journey to its destination, integrating geo-technical with geo-commercial protocols to provide the nomograms of decision making, where the real value to the business lies.  This simplified journey illustrates that data expand in complexity and volumes as we add value whilst becoming increasingly dissociated (losing association with their origins) and more difficult to manage from their points of origin.

K2V Limited supplies tools for harvesting knowledge lying dormant in large organisations and in converting that collective knowledge to value in support of decision making for the oil & gas sector (K2V: Knowledge to Value).  It is important to note here that K2V Limited is not offering any data management solutions, confining our services to knowledge and perceptions. The life-cycle model provides an overlay to existing internally resourced or outsourced data management frameworks attached to your organisation.  In the past, K2V Ltd has been explicit in separating data from knowledge (see here), claiming that they are polar opposites: data are unique and immutable, whereas knowledge is non-unique and fickle.  Knowing this means that data and knowledge have to be managed entirely differently, despite their sub-parallel growth in the life-cycle model.  In my experience, the fastest way to empty a room filled with geoscientists is to mention the word "metadata".  I jest in part but it does no harm to remind ourselves that data (without metadata) don't know that they exist or even that they have any importance to the business.  Only knowledge holders can achieve this distinction.  K2V Ltd believes that when the activity link is broken between knowledge and data, that is when the data become amorphous.  The lack of corporate memory orphans the data, challenging their validity or worse, undermining entitlement.

So how can you mine an amorphous data mountain to weed out what has value and reject those that don't without becoming completely entangled in the weeds?  K2V Ltd. has developed a very simple way to associate data with knowledge by crowd-sourcing; the newly created Data PinMap™ uses the intrinsic value of the 211 individual data items to the maturation life-cycle by mapping the data class (tied to the raw source) to data life-cycle maturity.  The tool has two functions:

1)     crowd-source knowledge holders to recollect (from their knowledge) the data they used to conduct their own investigation of an area, which includes dates, data maturity and entitlement

2)     allow the business to trace lost data and to mine amorphous data mountains with an association to context

This is how it works: knowledge holders are invited to stick a pin anywhere on the map to register key data that they are aware of.  It works in the same way as the Knowledge PinMap™ (described elsewhere), except that instead of "Function in Role", you register "Data Class" and instead of "Level of Knowledge", you indicate "Level of Maturity".  The Data Class applies generalised raw data types used in the oil & gas sector, so that when the position in the data maturity life-cycle is known, a reduced list of discrete data items can be identified.  In effect, therefore, the database draws on a taxonomy of data items generated through (and associated with) generic activities.  By crowd-sourcing the data people have worked with, a framework emerges of the data building blocks that have shaped corporate perceptions in making decisions - reconnecting 4th order derivations to their raw origins may become overly complex but if successful, can provide an audit trail of the activities and processes allowing the asset to challenge assumptions and identify gaps.  The tool creates metadata, which then needs to be picked up by data managers to complete the association between knowledge and data.  The Data PinMap™ provides for the first time, a tool for data managers to retro-fit metadata to their data mountain.

K2V believes that data are meaningless without knowledge in the same way that knowledge has no meaning unless supported by data.  By creating the link between knowledge and data through generic activities in the maturation life-cycle provides for the first time the means to re-associate amorphous data with their context by crowd-sourcing (retro-fitting) metadata.  This is not a data management solution because the metadata still have to be mapped on to existing corporate data management frameworks.  What the tool does do is provide the key to unlocking data mountains.  If you would like to find out more, please contact or forward this article on to someone who might be interested.


Our thanks go to Carminatti and Pessoa for their paper at last month’s AAPG conference entitled “Digging Old Data to Drill New Wells”, which inspired us to frame a structured discussion around root causes and potential solutions.


Posted 11th May, 2017 as a Discussion:

A number of you have requesting me to expand on “data entitlement”.  Data entitlements take us straight back to corporate ownership: even if we acquired and paid for proprietary data, to what extent do we actually own it?  

Associating data with knowledge has a hidden benefit potentially for future Production Sharing Contracts (PSC) and entitlements.  Knowing how much IP is added to data to create derived versions of those data is of interest to lawyers because for the first time, it makes a clear distinction between data entitlement and value added to those data.  The data unconformity marks the point where data become dissociated from the original data entitlement, arguably owned 100% by a PSC contractor.  Raw data are clearly subject to entitlement and are either owned or under license from Governments or vendors.  First and second order derivations are much less clear, forming a grey area for lawyers in the past.  The life-cycle model can form the basis for a definition of ownership to lawyers so that PSCs can be absolutely clear about which data belong to Governments (or vendors) and provide the basis for a discussion around the level of contractor IP also claimed by industry.  Going back to the seismic example, for instance, SEG-D processing (first order data) is specific to those data and does represent significant IP but nonetheless, many Governments claim ownership of processed data and most vendors provide a version of processed data as licensed standard.  The grey area in some countries comes with second order data.  If the industry at large was to agree a global standard of ownership based on the model expressed here, future PSC negotiations could be much clearer about who has what entitlement during the data maturation cycle.  

In societies where justice is blind, we as an individuals are innocent before being proven guilty.  Not so with data, which are always guilty before being proven innocent.   Data entitlement, has to be proven otherwise it doesn’t exist.  That puts the burden of proof on to the data owner to assert rights over data and yet entitlement is often what we know least about it.