What is Machine Learning?

What is Machine Learning?

If you’re new to Machine Learning, let’s start at the top. The whole field of artificial intelligence is broken up into two categories – Strong AI and Narrow AI.

Strong AI is coming up with a robot that looks and behaves like a person. Narrow AI, or “neural networks” attempt to duplicate the brain’s neurological processes that have been perfected over millions of years of biological development.

Machine Learning is a subset of Narrow AI that does pattern classification. It’s an engine – an algorithm that learns without explicit programming. It learns from the data. What does that mean? Given one set of data, it’s going to come up with an answer. But given a different set of data, it will come up with something different.

A Self-Organizing Map is a type of neural network that adjusts to training data. However, it makes no assumptions about the characteristics of the data. So, if you look at the whole field of artificial intelligence, and then we look at machine learning as a subset of that, there’s two parts: supervised neural networks and unsupervised neural networks. Unsupervised is where you feed it the data and say “you go figure it out.” In supervised neural networks, you give it both the data and the right answer. Some examples of supervised neural networks would be convolutional neural networks and deep learning algorithms. Convolutional is a more classical type of a supervised neural network, where for every data sample, we know the answer.

Here’s a classical example of a supervised neural network: Your uncle just passed away and gave you the canning operations in Cordova, Alaska. You go there and observe the employees taking the fish off the conveyor and manually sorting them by type – buckets for eels and buckets for flounder and so forth. Can you use AI (machine learning) to do something more efficient? Perhaps have those employees do something more productive? Absolutely! As the eels come along, you weigh them, you take a picture of them, you see what the scales are, general texture, you get some idea about the general shape of them. There’s three properties already. You continue running eels through and maybe get up to four or five properties, including measurements, etc. The neural network is then trained on eels. Then, you do the same thing with all the flounder. There are going to be variations, of course, but in attribute space, of those four or five properties that we made for each one, they’re going to wind up in a different cluster in attribute space. And that’s how we tell the difference between eels and flounder. Everything else that you can’t classify very well, you don’t know. All of that goes into the algorithm. That’s the difference between supervised neural networks and unsupervised neural networks.

At Geophysical Insights, we believe we should be able to query our seismic data for information with learning machines just as effortlessly and with as much reliability as we query the web for the nearest gas station.

Advancing Seismic Research with Modular Frameworks

Advancing Seismic Research with Modular Frameworks

By: Felix Balderas, Geophysical Insights
Published with permission: Oilfield Technology
September 2011

In many disciplines, a greenfield project is one that lacks any constraints imposed by prior work. The analogy is to that of construction on ‘greenfield’ land where there is no need to remodel or demolish an existing structure. However, pure greenfield projects are rare in today’s interconnected world. More often one must interface with existing environments to squeeze more value from existing data assets or add components to a process, manage new data, etc. Adding new technologies to legacy platforms can lead to a patchwork of increasingly brittle interfaces and a burgeoning suite of features that may not be needed by all users. Today’s challenge is to define the correct ‘endpoints’, which can join producer and consumer components in a configurable environment.

This article highlights a strategy used to develop new seismic interpretation technology and the extensible platform that will host the application. The platform, which is code-named Paradise, includes an industry standard database, scientific visualization, and reporting tools on a service-based architecture. It is the result of extensive research and technology evaluations and development.

Geophysical Insights develops and applies advanced analytic technology for seismic interpretation. This new technology, based on unsupervised neural networks (UNN), offers dramatic improvements in transforming and analyzing very large data sets. Over the past several years, growth in seismic data volumes has multiplied in terms of geographical area, depth of interest, and multiple attributes. Often, a prospect is evaluated with a primary 3D survey along with five to 25 attributes serving general and unique purposes. Self-organizing maps (SOM) are a type of UNN applied to interpret seismic reflection data. The SOM, as shown in Figure 1, is a powerful cluster analysis and pattern recognition method developed by Professor Teuvo Kohonen of Finland.

SOM analysis for seismic interpretation

SOM analysis for seismic interpretation

UNN technology is unique in that it can be used to identify seismic anomalies through the use of multiple seismic attributes. Supervised neural networks operate on data that has been classified so the answer is known in specific locations, providing reference points for calibration purposes. With seismic data, a portion of a seismic survey at each logged well is known. UNN however, do not require the answer to be ‘known’ in advance and therefore are unbiased. Through the identification of these anomalies, the presence of hydrocarbons may be revealed. This new disruptive technology has the potential to lower the risk and time associated with finding hydrocarbons and increases the accuracy of estimating reserves.

The company decided to build new components separately, then, with loosely coupled interfaces, add back the legacy components as services. In its efforts to build a new application for the neural network analysis of seismic data, Geophysical Insights struggled to find a suitable platform that met the goals of modularity, adaptability, price, and performance. While in the process of building new technologies to dramatically change seismic interpretation workflow, an opportunity arose for a new approach in advancing automation, data management, interpretation and collaboration using a modular scientific research platform with an accessible programming interface.

Seismic interpretation workflow with Paradise

Figure 2. A software framework for hosting oil and gas software applications

With this new technology concept underway, an infrastructure was needed to support a core technology and make that infrastructure available for others. To deploy their own core technology, they would potentially need databases, schemas, data integration tools, data loaders, visualization tools, licensing, installers, hard copy, and much more. While not everyone would need the numerous lower level components, they would find that there is more work to be done on the supporting infrastructure than on the core technology itself. Without a platform, each vendor would have to undergo the long process of gathering requirements, developing, testing, and evaluating numerous frameworks, all for something that is not their core product. This is not only a major distraction from developing the core technology, but also an expensive endeavor that most are not ready to make, and in some cases perhaps a deal breaker for the project. The company decided to move forward, developing a platform for itself that would be useful for others. The basic concepts around the chosen architecture are depicted in Figure 2. The goal was to build an affordable, yet powerful platform that could be used by small and large organizations alike, for building and testing new software technologies and shortening the time between design and deployment of new components. Developing a platform separate from the core component meant that it was possible to overlap the development activities for the core component and platform. This minimized the impact that changes in the platform had on the science component and vice versa, thus reducing delivery time. Similar platforms already existed but due to their price, these were out of reach for many smaller vendors and potential end users. Any vendor wanting to promote a simple tool integrated on pricey platforms would find a limited audience based on who could afford the overall platform. End users would probably pay for extra but perhaps unused features. One of the company’s goals was for a modular, affordable overall platform. A vendor of a new component can choose to license portions of the Geophysical Insights’ platform as needed. A good software design practice is to include end users early in the process, making them part of the team. One thing that they made clear was their sensitivity to price, particularly maintenance costs.

The new generations are more accustomed to working with social collaboration and mobility tools. No longer can a scientist bury himself behind a pile of literature in a dark office to formulate a solution. With the changing demographics of geoscientists entering the workforce and declining research funds, the lag time between drawing a solution on the white board and when it can be visualized remotely across many workstations must be reduced. These are some of the challenges this platform tackles.

Design and architecture are all about trade-offs. One of the earliest decision points was the fundamental question of whether to go with open source or proprietary technology. This decision had to be made at various levels of the architecture, starting with the operating system, i.e. Linux versus Microsoft or both. Arguments abound regarding the pros and cons of open source technologies such as security, licensing, accountability, etc. In the end, although it was felt that Linux dominated server applications, when looking at the potential users, the majority would be using some version of Windows OS. This one early decision shaped much of the future direction, such as programming languages and development tools.

Evaluations were conducted at various architecture levels, taking time to try out the tools. The company designed data models and evaluated databases. For programming languages and development platforms, C++, C# with MS Studio IDE and Java with Eclipse IDE were evaluated to search for mixed-language interoperability, reliability and security. Java/Eclipse IDE did not meet all the set goals, instead better mixed language programming support was discovered between managed C#/.NET code and unmanaged Fortran for some scenarios. Other scenarios required multiple simultaneous processes.

At the GUI level, the company looked at Qt, WinForm and WPF. It was decided to use WPF because it allows for a richer set of UI customization including integration of third party GUI controls, which was also evaluated. Licensing tools, visualization tools and installers were also examined. (All of this is a bit too much to discuss here in detail, but Geophysical Insights advocates taking the time to evaluate the suitability of the technology to the application domain.)

The company also considered standards at different levels of the architecture. There is usually some tension between standards and innovation, so caution was needed about where to standardize. One component that appeared as a good candidate for standardization was the data model. Data assets such as seismic and well data were among the data that needed to be worked with, but the information architecture also required new business objectives that were not common to the industry. For example, the analytical data resulting from the neural net processes. A data model was required, which was simultaneously standard, yet customizable. It also needed to have the potential to be used as a master data store.

Professional Petroleum Data Model (PPDM) is a great, fully documented, and supported master data store, which shares a lot of common constructs with several other proprietary data stores, and has a growing list of companies using it. PPDM builds a platform and vendor independent, flexible place to put all the E&P data. The company actively participates in that community, helping to define best practices for the existing tables while proposing changes to the model.

Research, including attendance and participation at industry conferences and discussions with people tackling data management issues, made it clear that the amount of data, data types and storage requirements are growing exponentially. The ‘high-water’ marks for all metrics are moving targets. It will be a continuing challenge to architect for the big data used by the oil and gas business. ‘Big data’ refers to datasets that are so large they become awkward to work with using typical data management and analysis techniques. Today’s projects may include working with petabytes of data. Anyone building a boutique solution today will have to be prepared for rising high-water marks, and if they depend on a platform, they should expect the platform to be scalable for big data and extensible for new data types.

Neural networks in general, when properly applied, are adept at handling the big data issues through multidimensional analysis and parallelization. They also provide new analytical views on the data while automated processing eliminates human-induced bias, enabling the scientist to work at a higher level. Using these techniques, the scientist can arrive at an objective decision at a fraction of the time. In the face of a data deluge and a predicted shortage of highly skilled professionals, automated tools can assist in achieving the increasing productivity demands placed on people today.

The usefulness of a platform depends heavily on the architecture. Geophysical Insights has witnessed how rigid architectures in other software projects can become brittle over time, causing severe delays for new enhancements or modifications. However, business cannot wait for delayed improvements. Rigid architectures limit growth to small incremental steps and stifle the deployment of innovations. Today’s technology change rates call for a stream of new solutions, with high-level workflows including the fusion of multi-dimensional information.

A well designed architecture allows for interoperability with other software tools. It encompasses the exploration, capture, storage, search, integration, and sharing of data and analytical tools to comprehend that data, combined with modern interfaces and visualization in a seamless environment.

Good guidelines for a robust architecture include Microsoft’s Oil and Gas Upstream IT Reference Architecture. Another is IBM’s Smarter Petroleum Reference Architecture.

It was decided to implement the platform on Microsoft frameworks that support a service-oriented architecture. A framework is a body of existing source code and libraries that can be used to avoid having to write an application from scratch. There are numerous framework and design pattern choices for different levels of the architecture and too many for a review here. Object Relational Mapping is a good bridge between the data model and the application logic and the company also recommends N-Tiered frameworks.

Personnel who understand the business domain and the technology must carry out the implementation – otherwise one must plan to spend extra time discussing ontology and taxonomies. They must adhere to efficient source code development practices. The changing work environment will require tools and practices to deal with virtual teams, virtual machines and remote access. The company is using a test driven development (TDD) approach. This approach increases a developer’s speed and accuracy. It keeps requirements focused and in front of them, eliminating time spent on unnecessary features. It also enables parallel development of interdependent systems. In the long run, it yields dividends by reducing maintenance and decreasing risk. Using TDD, a developer can deliver high quality code with certainty.

Ultimately, the science has to come down to business. A good licensing strategy is one that will maximize revenues and allow users to buy products a-la-carte as opposed to a one-size-fits-all approach. Some vendors attempt to be creative with bundles for different levels of upgrades, but a configurable platform allows maximum user choice among available, even perhaps competing technologies. The market will favor vendors that innovate and manage data and licenses well.

Geophysical Insights’ neural network application presents an opportunity to examine seismic data in ways and means orthogonal to those of the legacy systems today. The research platform enabled the company to use this application as a configurable service. Making the right choices in information and application architectures and frameworks was the key to achieving the business objectives of modular services. The company can now move forward with additional science modules and tangential neural network processes, servicing a rapidly changing landscape, licensed to fit specific needs.

References

1. Kohonen, T., Self-Organising Maps, 3rd ed. (2001).

2. American Geosciences Institute, ‘Geoscience Currents’.

3. The Professional Petroleum Data Management Association (Accessed on 29 July 2011).

Unsupervised Neural Networks – Disruptive Technology for Seismic Interpretation

Unsupervised Neural Networks – Disruptive Technology for Seismic Interpretation

By: Tom Smith, Ph.D., Geophysical Insights
Published with permission: Oil & Gas Journal
October 2010

The energy industry is faced with an exploding growth of information from a variety of sources –  seismic surveys, well logs, and field production.  A step-change in technology is being developed that has the promise of geoscientists finding hydrocarbons more rapidly and with greater certainty by utilizing this large volume of data more effectively.  Further, apart from automated tools to make better use of the data being collected, the industry risks wasting this valuable resource.  Supported by advanced software, a branch of neural networks is being found to be at least one practical solution for reducing the risk and time in finding oil and gas.  Neural network technology is used today in financial services software, pattern recognition systems, and many other settings. The general class of problems addressed by neural network technology in business is varied and diverse.  While there are several commercial tools in the upstream oil & gas industry which are based on “supervised” neural networks, this paper describes how unsupervised neural network technologies can be used with “unclassified” data, a much more difficult problem having higher value results.

A supervised neural network operates on data that has been classified, i.e., the answer is known in specific locations, providing reference points for calibration purposes.  In the case of seismic data, for instance, a portion of a seismic survey at each logged well is known.  The well log provides the ground truth.  Supervised neural networks link the seismic data at the well to the known results from the well.  However, supervised neural networks have limited application since the earth is so heterogeneous thus rendering classification away from boreholes difficult.  In contrast, unsupervised neural networks do not require that the “answer” be known in advance and therefore unbiased.

The other challenge working with supervised neural networks is that statistics grow more powerful as more wells provide more classified data.  But that flies in the face of the more typical situation where our most important decisions need to be made when there are no or few wells.seismic interpretation with machine learning 01

As the number of wells increases, the value of a supervised neural network diminishes.  In contrast, unsupervised neural networks do not require drilled wells and can be run against seismic reflection data alone.

The balance of this paper describes how unsupervised neural network technology can be used to identify seismic anomalies through the use of multiple seismic attributes and how these anomalies may reveal the presence of hydrocarbons, often when conventional methods fall short.  The new technology may also find application in prediction of lithologies and fluid properties; perform a comparative analysis of wells, and select the best seismic attributes for interpretation.  In seismic interpretation, unsupervised neural networks can be used to reveal subtle geologic features that may have been missed by conventional analytic methods. Through the balance of the paper, the term “neural network” will refer to only the unsupervised form of the technology.

Case Study:  Auburn Energy

Four wells have been drilled since 2006 in the “study area” of northern Wharton County, Texas.  Using a popular industry suite of seismic interpretation software, the company interpreted several locations to be lower risk gas prospects.  The first well drilled encountered a formation that flowed.  A large quantity of gas was found; however, much of that gas was not economically recoverable.  A second well was drilled two years later that found an economic gas reservoir that has produced for over three years.  Two subsequent wells have been drilled that did not find economic reserves.  In all but one case of these wells, the original seismic interpretation indicated the presence of gas reservoirs.

“From the neural network interpretation it was clear that the two dry holes were drilled in locations that were not in economic gas concentrations”, says Deborah Sacrey, Owner of Auburn Energy.  “Applying the Geophysical Insights neural network technology to some 13 attributes, we can now see that two of the four wells would not have been drilled, saving investors about $8MM.  I had included all of the AVO attribute analysis for the area of study available at the time.  The neural network attribute analysis went well beyond conventional analysis by assimilating many more attributes than conventional software tools.  We are now expanding the study area using the neural network technology to confirm additional exploratory prospects.”

As an indication of the effectiveness of the neural network technology, Figure 1 is seismic data from the existing gas field referenced above in Wharton County.  Figure 1 is comprised of seismic reflection data and fault interpretation using conventional, commercially available, seismic interpretation software.  Since two wells were dry holes, only two of the four wells drilled in the field are shown in Figure 1.   The conventional analysis in Figure 1 indicates both reservoirs, one of which could not be produced because of very fine formation particles flowing along with the gas.  The well on the left resulted in a gas ‘show’ only, while the well on the right has been producing for three years.   In Figure 2, the original data is replaced with a neural network analysis based on a combination of 13 seismic attributes, revealing two seismic anomalies located near the two wells shown in Figure 1Figure 1 vs. 2 effectively compare conventional (“before”) and neural network (“after”) on the field of study.  Since the neural network analysis resulted in only two anomalies being indicated, it is likely that only two wells would have been drilled out of the four.  Also of interest is the position of the two wells near the edge of the two seismic anomalies, suggesting a potential “near miss” in the location of the wells.

Applying Self Organizing Maps to Seismic Interpretation

 

seismic attributes

Nature is full of examples of how animals, following a few simple rules, organize themselves into assemblages such as moving flocks, schools and herds.  Moreover, they re-organize themselves after a disruption to their normal pattern of movement.  Consider a flock of migratory geese and a school of fish.  After taking flight, the flock of geese quickly organizes into the familiar ‘V’ flying pattern.  A school of fish forms and moves about as a protection against predators.  In either the case of flying geese or school of fish, the assemblage quickly disperses at the threat of a predator and quickly re-assembles once the threat is past.  In both instances, the assemblage is robust yet each individual in the group is behaving according to a few simple instructions, i.e., ‘if not the leader, follow the individual ahead and remain to the left or right’.  The neurons in a neural network are presented with data and adapt to the data following a set of simple rules. The neural network becomes, in essence, a “learning machine” whereby the network adapts to the characteristics of the data resulting in what is called Self Organizing Maps (SOM’s).  The input data is unclassified and the learning process is unattended.  The SOM is a powerful cluster analysis and pattern recognition method developed by Professor Teuvo Kohonen of Finland during the 1970’s and 80’s.  In the case study shown above, we present results based on SOM on a 3D seismic survey consisting of a large number of seismic attributes.  These results constitute an ongoing portion of our research in this area.  Neural networks offer an automated process to assist seismic interpretation, for instance, accelerating prospect evaluation by…

  • Enabling the rapid comparison of large sets of seismic attributes,
  • Identifying combinations of attributes that reveal seismic anomalies, and
  • Distilling the interpretation process to identify hydrocarbons with greater speed and certainty.

Consider for a moment the quantity of data available from a single seismic survey and how neural networks may be applied to reveal insights in the seismic reflection data.   The main task facing a geoscientist is to identify and ascribe the geologic meaning to observable patterns in the data.  The most obvious patterns are found in seismic reflections, but in recent years the industry is using more subtle patterns and relating them to such features as porosity, lithology, and fluid content, as well as underground structure.  The isolation of such patterns and their use as possible identifiers of subsurface characteristics constitutes attribute analysis, which is a standard tool in the geoscientist’s toolkit.  Over the past several years, growth in seismic data volumes has multiplied many times in terms of geographic area covered, depth of interest, and the number of attributes.  Often, a prospect is evaluated with a primary 3D survey along with 5 to 25 attributes serving general and unique purposes.  A group of just five typical seismic reflection attributes is shown in Figure 3.

For illustration purposes, Figure 4 (left) depicts three attributes from a single 3D survey.  The three points near the center highlight one data sample for three associated attributes, aligned as parallel rectangular blocks.  Converting the three attributes into a SOM attribute perspective, as shown in Figure 4 (right), each point sample is plotted in attribute space along three attribute axes, resulting in a natural cluster of similar characteristics.  The natural clusters constitute regions of higher information density and may indicate seismic events or anomalies in the data.  Two additional natural clusters are illustrated in Figure 4 as well.  Initially, neurons are randomly placed by the algorithm in attribute space.  In the “learning” stage, neurons are attracted to the data samples in the clusters in a recursive process.  Ultimately after neuron movement has finished, the neurons reveal subtle combinations of attributes that may highlight the presence and type of hydrocarbons.  While the details of the algorithm are available in the technical literature, suffice to say that Figures 1 and 2 compare a conventional seismic data display, offering limited resolution, to a neural network classification of the same data. The neural network depiction dramatically increases the resolution and insight into the data.  

Other Applications of Neural Networks

 

three seismic attributes sorted with neural networks

Large volumes of seismic data are typically good candidates for using neural networks to identify anomalies in the data.  Beyond the most immediate opportunity of using neural networks to aid seismic interpretation, other valid applications of the technology include…

  • Identifying errors and gaps in data for quality assurance
  • Analyzing seismic attributes with well log data for better predictions away from the wells
  • Integrating seismic data in reservoir characterization and simulation
  • Incorporating micro-seismic events with other seismic data for better fracture prediction

Fortunately, large sets of data can be evaluated by a neural network rapidly, typically in a matter of minutes to a few hours, making their iterative use quite practical.  They can also be programmed to run unattended and report by exception when anomalies are encountered.

Getting Started with Neural Network Technology

Continuing with the example of seismic interpretation, the following basic steps are recommended when planning a neural network application.   Since neural networks are highly specialized technology, having a thorough understanding of the methodology of neural networks and the appropriate choice of parameters for neural network classification is strongly encouraged.  The following four general tasks outline the key steps in conducting a neural network analysis.

  1. Perform an assessment that reveals the right choice of seismic attributes
  2. Conduct an appropriate interpretation of attributes for the geologic trends of interest
  3. Select the well information, where available, for calibration purposes to bring ground truth to the seismic response
  4. Generate new attribute volumes – the neural network classification and a classification reliability

One of the keys to a successful project is selecting the best choice of seismic attributes, revealed by a thorough assessment of the data.  This step will require a deep knowledge of geophysics, of course, and is optimally conducted by domain experts.  As the neural network operates on the data, visual output from various attributes will require an interpretation of the attributes for the geologic trends of interest.  Where available, well information is then used for calibration purposes to bring the all-important ground truth to the seismic response.  The complete analysis will result in two new attribute volumes – a neural network classification and a classification reliability, which identifies uncertainty in the classification.

Conclusion

A major change is needed to take full advantage of the explosion of data in the oilfield.  Neural network technology enables greater insights into all types of data but has its greatest value when applied to seismic interpretation.  Neural networks are proving their value to reduce the time and costs for the interpretation process while increasing the dependability of the results. The technology can also be used to correlate well information with well log data and enhance the quality of reservoir simulation.  Neural networks have the promise of being a disruptive technology that will accelerate and improve the industry’s use of data from the field.

About Geophysical Insights

Serving exploration and production companies, Geophysical Insights provides specialized consulting and training in the methodology of neural networks and the appropriate choice of parameters for neural network classifications.  The company’s current work is in applying neural network technology to real problems and innovative applications.  Services are delivered through client collaboration and training.  An objective of each client engagement is to enable the client to obtain a practical understanding and use of methodologies and tools that can transform the interpretation process.

 

Dr. Thomas Smith

THOMAS A. SMITH received BS and MS degrees in Geology from Iowa State University. In 1971, he joined Chevron Geophysical as a processing geophysicist. In 1980 he left to pursue doctoral studies in Geophysics at the University of Houston. Dr. Smith founded Seismic Micro-Technology in 1984 and there led the development of the KINGDOM software suite for seismic interpretation.  In 2007, he sold the majority position in the company but retained a position on the Board of Directors.  SMT is in the process of being acquired by IHS. On completion, the SMT Board will be dissolved. IN 2008, he founded Geophysical Insights where he and several other geophysicists are developing advanced technologies for fundamental geophysical problems.

The SEG awarded Tom the SEG Enterprise Award in 2000, and in 2010, GSH awarded him the Honorary Membership Award.  Iowa State University awarded him Distinguished Alumnus Lecturer Aware in 1996 and Citation of Merit for National and International Recognition in 2002. Seismic Micro-Technology received a GSH Corporate Star Award in 2005.  In 2008, he founded Geophysical Insights to develop advanced technologies to address fundamental geophysical problems. Dr. Smith has been a member of the SEG since 1967 and is also a member of the HGS, EAGE, SIPES, AAPG, GSH, Sigma XI, SSA, and AGU.