As part of our quarterly series on machine learning, we were delighted to have had Dr. Tao Zhao present applications of Convolutional Neural Networks (CNN) in a worldwide webinar on 20 March 2019 that was attended by participants on every continent. Dr. Zhao highlighted applications in seismic facies classification, fault detection, and extracting large scale channels using CNN technology. If you missed the webinar, no problem! A video of the webinar can be streamed via the video player below. Please provide your name and business email address so that we may invite you to future webinars and other events. The abstract for Dr. Zhao’s talk follows:
We welcome your comments and questions and look forward to discussions on this timely topic.
Abstract: Leveraging Deep Learning in Extracting Features of Interest from Seismic Data
Mapping and extracting features of interest is one of the most important objectives in seismic data interpretation. Due to the complexity of seismic data, geologic features identified by interpreters on seismic data using visualization techniques are often challenging to extract. With the rapid development in GPU computing power and the success obtained in computer vision, deep learning techniques, represented by convolutional neural networks (CNN), start to entice seismic interpreters in various applications. The main advantages of CNN over other supervised machine learning methods are its spatial awareness and automatic attribute extraction. The high flexibility in CNN architecture enables researchers to design different CNN models to identify different features of interest. In this webinar, using several seismic surveys acquired from different regions, I will discuss three CNN applications in seismic interpretation: seismic facies classification, fault detection, and channel extraction. Seismic facies classification aims at classifying seismic data into several user-defined, distinct facies of interest. Conventional machine learning methods often produce a highly fragmented facies classification result, which requires a considerable amount of post-editing before it can be used as geobodies. In the first application, I will demonstrate that a properly built CNN model can generate seismic facies with higher purity and continuity. In the second application, compared with traditional seismic attributes, I deploy a CNN model built for fault detection which provides smooth fault images and robust noise degradation. The third application demonstrates the effectiveness of extracting large scale channels using CNN. These examples demonstrate that CNN models are capable of capturing the complex reflection patterns in seismic data, providing clean images of geologic features of interest, while also carrying a low computational cost.
Geophysical Insights – University Challenge Topics
Call for Abstracts
The following “Challenge Topics” are offered to universities who are part of the Paradise University Program. Those universities are encouraged to consider pursuing one or more of the topics below in their research work with Paradise® and related interpretation technologies. Students interested in researching and publishing on one or more of these topics are welcome to submit an abstract to Geophysical Insights, including an explanation of their interest in the topic. The management of Geophysical Insights will select the best abstract per Challenge Topic and provide a grant of $1,000 to each student upon the completion of the research work. Student(s) who undertake the research may count on additional forms of support from Geophysical Insights, including:
• Potential job interview after graduation
• Special recognition at the Geophysical Insights booth at a future SEG
• Occasional collaboration via web meeting, email, or phone with a senior geoscientist
• Inclusion in invitations to webinars hosted by Geophysical Insights on geoscience topics
Challenge Research Topics
Develop a geophysical basis for the identification of thin beds below classic seismic tuning
The research on this topic will investigate applications of new levels of seismic resolution afforded by multi-attribute Self-Organizing Maps (SOM), the unsupervised machine learning process in the Paradise software. The mathematical basis of detecting events below classical seismic tuning through simultaneous multi-attribute analysis – using machine learning – has been reported by Smith (2017) in an abstract submitted to SEG 2018. (Subsequently, the abstract has been placed online as a white paper resource). Examples of thin-bed resolution have been documented in a Frio onshore Texas reservoir, and in the Texas Eagle Ford Shale by Roden, et al., (2017). Therefore, the researcher is challenged to develop a better understanding of the physical basis for the resolution of events below seismic tuning vs. results from wavelet-based methods. Additional empirical results of the detection of thin beds are also welcomed. This approach has wide potential for both exploration and development in the interpretation of facies and stratigraphy and impact on reserve/resource calculations. For unconventional plays, thin bed delineation will have a significant influence on directional drilling programs.
Determine the effectiveness of ‘machine learning’ determined geobodies in estimating reserves/resources and reservoir properties
The Paradise software has the capability of isolating and quantifying geobodies that result from a SOM machine learning process. Initial studies conducted with the technology suggest that the estimated reservoir volume is approximately what is being realized through the life of the field. This Challenge is to apply the geobody tool in Paradise along with other reservoir modeling techniques and field data to determine the effectiveness of geobodies in estimating reserves. If this proves to be correct, the estimating of reserves from geobodies could be done early in the lifecycle of the field, saving engineering time while reducing risk.
Corroborate SOM classification results to well logs or lithofacies
A challenge to cluster-based classification techniques is corroborating well log curves to lithofacies. Up to this point, such corroboration has been an iterative process of running different neural configurations and visually comparing each classification result to “ground truth”. Some geoscientists (results yet to be published) have used bivariate statistical analysis from petrophysical well logs in combination with the SOM classification results to develop a representation of the static reservoir properties, including reservoir distribution and storage capacity. The challenge is to develop a methodology incorporating SOM seismic results with lithofacies determination from well logs.
Explore the significance of SOM low-probability anomalies (DHIs, anomalous features, etc.)
In addition to a standard classification volume resulting from a SOM analysis, Paradise also produces a “Probability” volume that is composed of a probability value at each voxel for a given neural class (neuron). This technique is a gauge of the consistency of a feature to the surrounding region. Direct Hydrocarbon Indicators (DHIs) tend to be identified in the Paradise software as “low probability” or “anomalous” events because their properties are often inconsistent with the region. These SOM low probability features have been documented by Roden et al. (2015) and Roden and Chen (2017). However, the Probability volume changes with the size of the region analyzed, and with respect to DHIs and anomalous features. This Challenge will determine the effectiveness of using the probability measure from a SOM result as a valid gauge of DHIs and set out the relationships among the optimum neural configuration, the size of the region, and extent of the DHIs.
Map detailed facies distribution from SOM results
SOM results have proven to provide detailed information in the delineation and distribution of facies in essentially any geologic setting (Roden et al., 2015; Roden and Santogrossi, 2017; Santogrossi, 2017). Due to the high-resolution output of appropriate SOM analysis, individual facies units can often be defined in much more detail than conventional interpretation approaches. Research topics should be related to determining facies distribution in different geological environments utilizing the SOM process, available well log curves, and regional knowledge of stratigraphy.
For more information on Paradise or the University Challenge Program, please contact:
Let’s talk for a minute about the concepts of Big Data.
Remember a few years ago, if you wanted to survive in the oil and gas business, saving the whales was all the rage? We searched for some way to incorporate protecting the whales into our exploration geophysics, and that would affect operations. Well, we have another big thing today – Big Data. We’re always looking for ways to tie in what we’re doing to Big Data. The bosses up at board level – they’re all talking about Big Data. What is it?
Big Data is access to large volumes of disparate kinds of oil and gas data, which we then feed to machine learning algorithms to discover unknown relationships. It’s the unknown data we’ve never spotted before. A key to that definition is “disparate kinds”. So, if you say “I’m doing big data with my seismic data” – that’s not really an appropriate choice of terms. If you say “I’m going to throw in all my seismic data, along with associated wells, and my production data.” – NOW you are starting to talk about real Big Data operations.
A couple more key terms to keep in mind:
Data Mining is evaluating Big Data with deep learning.
And finally, the Internet of Things (IoT).
This may actually have a bigger impact on our industry than even machine learning. The IoT refers to all the pieces of equipment and hardware in our lives being hooked up to the internet. The IoT is walking up to your web-enabled refrigerator that recognizes your face and what you add and remove to the contents. In our business, we’re looking at the GPS of the boat, the geophones – everything is a web-aware device to both send and receive. In fact, when the geophones get planted, their GPS is still communicating. We know when they are in the ground, and when they get pulled up, thrown in the back of a truck, and driven somewhere.
With the trifecta of those things – Big Data, IoT, and Data Mining we are approaching a new age in the oil and gas industry to know things and understand them in ways we never have before.
At Geophysical Insights, we believe You should be able to query your seismic data with learning machines just as effortlessly and with as much reliability as you query the web for the nearest gas station.
Today’s seismic interpreters must deal with enormous amounts of information, or ‘Big Data’, including seismic gathers, regional 3D surveys with numerous processing versions, large populations of wells and associated data, and dozens if not hundreds of seismic attributes that routinely produce terabytes of data. Machine learning has evolved to handle Big Data. This incorporates the use of computer algorithms that iteratively learn from the data and independently adapt to produce reliable, repeatable results. Multi-attribute analyses employing principal component analysis (PCA) and self-organizing maps are components of a machine-learning interpretation workflow (Figure 1) that involves the selection of appropriate seismic attributes and the application of these attributes in an unsupervised neural network analysis, also known as a self-organizing map, or SOM. This identifies the natural clustering and patterns in the data and has been beneficial in defining stratigraphy, seismic facies, DHI features, sweet spots for shale plays, and thin beds, to name just a few successes. Employing these approaches and visualizing SOM results utilizing 2D color maps reveal geologic features not previously identified or easily interpreted from conventional seismic data.
Steps 1 and 2: Defining Geologic Problems and Multiple Attributes
Seismic attributes are any measurable property of seismic data and are produced to help enhance or quantify features of interpretation interest. There are hundreds of types of seismic attributes and interpreters routinely wrestle with evaluating these volumes efficiently and strive to understand how they relate to each other.
The first step in a multi-attribute machine-learning interpretation workflow is the identification of the problem to resolve by the geoscientist. This is important because depending on the interpretation objective (facies, stratigraphy, bed thickness, DHIs, etc.), the appropriate set of attributes must be chosen. If it is unclear which attributes to select, a principal component analysis (PCA) may be beneficial. This is a linear mathematical technique to reduce a large set of variables (seismic attributes) to a smaller set that still contains most of the variation of independent information in the larger dataset. In other words, PCA helps determine the most meaningful seismic attributes.
Figure 1: Multi-attribute machine learning interpretation workflow with principal component analysis (PCA) and self-organizing maps (SOM).
Figure 2 is a PCA analysis from Paradise® software by Geophysical Insights, where 12 instantaneous attributes were input over a window encompassing a reservoir of interest. The following figures also include images of results from Paradise. Each bar in Figure 2a denotes the highest eigenvalue on the inlines in this survey. An eigenvalue is a value showing how much variance there is in its associated eigenvector and an eigenvector is a direction showing a principal spread of attribute variance in the data. The PCA results from the selected red bar in Figure 2a are denoted in Figures 2b and 2c. Figure 2b shows the principal components from the selected inline over the zone of interest with the highest eigenvalue (first principal component) indicating the seismic attributes contributing to this largest variation in the data. The percentage contribution of each attribute to the first principal component is designated. In this case the top four seismic attributes represent over 94% of the variance of all the attributes employed. These four attributes are good candidates to be employed in a SOM analysis. Figure 2c displays the percentage contribution of the attributes for the second principal component. The top three attributes contribute over 68% to the second principal component. PCA is a measure of the variance of the data, but it is up to the interpreter to determine and evaluate how the results and associated contributing attributes relate to the geology and the problem to be resolved.
Figure 2: Principal Component Analysis (PCA) results from 12 seismic attributes: (a) bar chart with each bar denoting the highest eigenvalue for its associated inline over thedisplayed portion of the seismic 3D volume. The red bar designates the inline with the results shown in 2b and c; (b) first principal component designated win orange and associated seismic attribute contribution to the right; and (c) second principal component in orange with the seismic contributions to the right. The highest contributing attributes for each principal component are possible candidates for a SOM analysis, depending on the interpretation goal.
Steps 3 and 4: SOM Analysis and Interpretation
The next step in the multi-attribute interpretation process requires pattern recognition and classification of the often subtle information embedded in the seismic attributes. Taking advantage of today’s computing technology, visualization techniques, and understanding of appropriate parameters, self-organizing maps, developed by Teuvo Kohonen in 1982, efficiently distill multiple seismic attributes into classification and probability volumes. SOM is a powerful non-linear cluster analysis and pattern recognition approach that helps interpreters identify patterns in their data, some of which can relate to desired geologic characteristics. The tremendous amount of samples from numerous seismic attributes exhibit significant organizational structure. SOM analysis identifies these natural organizational structures in the form of natural attribute clusters. These clusters reveal significant information about the classification structure of natural groups that is difficult to view any other way.
Figure 3 describes the SOM process used to identify geologic features in a multi-attribute machine-learning methodology. In this case, 10 attributes were selected to run in a SOM analysis over a specific 3D survey, which means that 10 volumes of different attributes are input into the process. All the values from every sample from the survey are input into attribute space where the values are normalized or standardized to the same scale. The interpreter selects the number of patterns or clusters to be delineated. In the example in Figure 3, 64 patterns are to be determined and are designated by 64 neurons. After the SOM analysis, the results are nonlinearly mapped to a 2D color map which shows 64 neurons.
Figure 3: How SOM works (10 seismic attributes)
At this point, the interpreter evaluates which neurons and associated patterns in 3D space define features of interest. Figure 4 displays the SOM results, where four neurons have highlighted not only a channel system but details within that channel. The next step is to refine the interpretation and perhaps use different combinations of attributes and/or use different neuron counts. For example, in Figure 4, to better define details in the channel system may require increasing the neuron count to 100 or more neurons to produce much more detail. The scale of the geologic feature of interest is related to the number of neurons employed; low neuron counts will reveal larger scale features, whereas a high neuron count defines much more detail.
Figure 4: SOM analysis interpretation of channel feature with 2D color map
Figure 5 shows the SOM classification from an offshore Class 3 AVO setting where direct hydrocarbon indicators (DHIs) should be prevalent. The four attributes listed for this SOM run were selected from the second principal component in a PCA analysis. This SOM analysis clearly identified flat spots associated with a gas/oil and an oil/water contact. Figure 5 displays a line through the middle of a field where the SOM classification identified these contacts, which were verified by well control. The upper profile indicates that 25 neurons were employed to identify 25 patterns in the data. The lower profile indicates that only two neurons are identifying the patterns associated with the hydrocarbon contacts (flat spots). These hydrocarbon contacts were difficult to interpret with conventional amplitude data.
Figure 5: SOM results defining hydrocarbon contacts on a seismic line through a field. Attributes chosen for the identification of flat spots were 1. instantaneous frequency; 2. thin bed indicator; 3. acceleration of phase; 4. dominant frequency
The profile in Figure 6 displays a SOM classification where the colors represent individual neurons with a wiggle-trace variable area overlay of the conventional amplitude data. This play relates to a series of thin strandline sand deposits. These sands are located in a very weak trough on the conventional amplitude data and essentially have no amplitude expression. The SOM classification employed seven seismic attributes which were determined from the PCA analysis. A 10x10 matrix of neurons or 100 neurons were employed for this SOM classification. The downdip well produced gas from a 6’ thick sand that confirmed the anomaly associated with a dark brown neuron from the SOM analysis. The inset for this sand indicates that the SOM analysis has identified this thin sand down to a single sample size which is 1 ms (5’) for this data. The updip well on the profile in Figure 6 shows a thin oil sand (~6’ thick) that is associated with a lighter brown neuron with another possible strandline sand slightly downdip. This SOM classification defines very thin beds and employs several instantaneous seismic attributes that are measuring energy in time and space outside the realm of conventional amplitude data.
Figure 6: SOM results showing thin beds in a strandline setting
The implementation of a multi-attribute machine-learning analysis is not restricted to any geologic environment or setting. SOM classifications have been employed successfully both onshore and offshore, in hard rocks and soft rocks, in shales, sands, and carbonates, and as demonstrated above, for DHIs and thin beds. The major limitations are the seismic attributes selected and their inherent data quality. SOM is a non-linear classifier and takes advantage of finely sampled data and is not burdened by typical amplitude resolution limitations. This machine learning seismic interpretation approach has been very successful in distilling numerous attributes to identify geologic objectives and has provided the interpreter with a methodology to deal with Big Data.
ROCKY RODEN owns his own consulting company, Rocky Ridge Resources Inc., and works with several oil companies on technical and prospect evaluation issues. He also is a principal in the Rose and Associates DHI Risk Analysis Consortium and was Chief Consulting Geophysicist with Seismic Micro-technology. He is a proven oil finder (36 years in the industry) with extensive knowledge of modern geoscience technical approaches (past Chairman – The Leading Edge Editorial Board). As Chief Geophysicist and Director of Applied Technology for Repsol-YPF, his role comprised advising corporate officers, geoscientists, and managers on interpretation, strategy and technical analysis for exploration and development in offices in the U.S., Argentina, Spain, Egypt, Bolivia, Ecuador, Peru, Brazil, Venezuela, Malaysia, and Indonesia. He has been involved in the technical and economic evaluation of Gulf of Mexico lease sales, farmouts worldwide, and bid rounds in South America, Europe, and the Far East. Previous work experience includes exploration and development at Maxus Energy, Pogo Producing, Decca Survey, and Texaco. He holds a BS in Oceanographic Technology-Geology from Lamar University and a MS in Geological and Geophysical Oceanography from Texas A&M University. Rocky is a member of SEG, AAPG, HGS, GSH, EAGE, and SIPES.
DEBORAH SACREY is a geologist/geophysicist with 39 years of oil and gas exploration experience in the Texas and Louisiana Gulf Coast, and Mid-Continent areas. For the past three years, she has been part of a Geophysical Insights team working to bring the power of multiattribute neural analysis of seismic data to the geoscience public. Sacrey received a degree in geology from the University of Oklahoma in 1976, and immediately started working for Gulf Oil. She started her own company, Auburn Energy, in 1990, and built her first geophysical workstation using Kingdom software in 1995. She specializes in 2-D and 3-D interpretation for clients in the United States and internationally. Sacrey is a DPA certified petroleum geologist and DPA certified petroleum geophysicist.
Permanent sensors both on land and on the seafloor are collecting a new stream of seismic data that can be used for repeated active seismic, microseismic analysis, and continuous passive monitoring. Distributed acoustic sensors (DAS) record continuous seismic data very cheaply, taking another quantum step in the amount of data coming from the reservoir during exploration, development and production.
These are just two examples of how dramatically the volume of technical data is rising, says Biondo Biondi, professor of geophysics at Stanford University. “The big change taking place is in the breadth of data we can get with different kinds of sensors,” he states. “Beyond seismic, there are streams of data from sensors measuring temperature, pressure, flow, and other physical information. This is putting a strain on computational capability, but it does open the possibility of a lot of integration of geophysical and other data.”
Data sources are evolving rapidly, becoming less expensive and providing denser data. “One Stanford student is experimenting with rotational sensors that record six or seven components,” says Biondi. “Others are working with both active and passive DAS data.”
When and how to process those data are also subjects of study. “A simple DAS fiber creates terabytes of data every day,” he explains. “It is unlikely that all of the data can move in bulk across the network. Instead, some amount of real-time processing will be needed near the source.”
In addition, while DAS arrays offer a low-cost way to collect dense acoustic data passively or actively, data quality is lower than from conventional geophones, Biondi says. “The challenge in this case is to get high-quality insight from low-quality data.”
While cloud computing certainly is proving useful in meeting some industry needs, Biondi says it may be more appropriate to keep the data “closer to the ground” because of its volume and proprietary nature. “Fog computing is the term for this mixed model,” he relates.
Data collected from DAS acquisition may be preprocessed local to the acquisition center, for example, then sent to the cloud for analysis, and then into the hands of the interpreter, he speculates. “The more channels and better data collected, the more accurate the wave field capture,” he comments. “This will speed the transition from seismic processing to waveform imaging. The interpreter and processor can interact in a feedback loop. Many types of geological and geophysical information could be part of the fog computing process.”
Another ongoing trend in geophysics is a push to place more emphasis on reservoir-centered geology, according to Biondi. “The goal is tighter integration of reservoir properties, geomechanics, seismic, petrophysics, etc. One student constrained anisotropic parameter estimation using petrophysical data and well logs and models. Some students are constraining attenuation and connecting seismic with geomechanics, including reservoir compaction and overburden stretching. Others are working with reservoir engineers to model fluid flows that include geomechanical effects,” he notes.
“As we move toward waveform inversion, we no longer are dealing with ‘magic’ processing parameters, but with more description of the geology,” says Biondi. “That allows us to bring quantitative information into seismic imaging.”
That includes unconventional plays, where Biondi says integrated reservoir analysis soon could be performed in real time to guide well planning, drilling, completion and fracturing design decisions.
An important step in data analysis is merging statistical data analytics with physics-based analysis. “Traditional seismic imaging is based on the physics of waveform propagation, fluid flow modeling is based on physics, and geomechanical analysis is based on mechanical modeling,” Biondi remarks. “By adding details about the physics and geology, we can point researchers in the direction of physical phenomenon or geological settings where a different understanding of the geology and physics is needed.”
Integrating Data And Processes
The industry is finding tremendous value in integrating data and multidisciplinary processes, says Kamal Al-Yahya, senior vice president at CGG GeoSoftware. Traditional tools for reservoir characterization and petrophysical analysis were essentially siloed by data type and discipline. Geophysicists worked with seismic data, geologists worked with petrophysical data, and drilling and reservoir departments worked with engineering data.
“The associated applications for each domain can be best in class, but workflows still can suffer from addressing only part of the data spectrum and serving only a segment of the different disciplines involved,” he observes. “Industry professionals would like to work together more to improve efficiency and build on one another’s ideas. That requires integration.”
Integration at the workflow level lets users access several applications in interpretation and design workflows without having to move data, he explains, referencing the example of a smart phone where contact data are used by many applications from a single source. “Users in various disciplines can begin to collaborate. Normally they have different perspectives,” Al-Yahya says. “Everybody can be looking at the same data, but users in each discipline will see them differently based on their areas of expertise.”
While upstream software applications tend to be highly scientific and complex, Al-Yahya says new computing technologies are making applications easier to use. “A complex application does not have to have a complex interface,” he holds. “Simpler interfaces support collaboration between geographically dispersed experts and across disciplines.”
Automation is an important step toward reducing interface complexities. Al-Yahya points out that processing algorithms at the front end of seismic analysis have automated removing survey footprints and tracking geologic feature. “Artifacts introduced by sources and receivers during acquisition are automatically removed, substantially relieving the burden on interpreters who used to spend hours meticulously correcting the data,” he notes. “Geologic features are identified automatically, allowing interpreters to navigate through dips, staying on a specific feature even through complex geology.”
These and other automated capabilities save time and help interpreters avoid mental fatigue. “If you spend all your time picking features, there is no time or energy left for analysis,” Al-Yahya observes.
In geostatistical applications, generating and evaluating multiple realizations used to be a processing bottleneck. But processing time has been shortened dramatically by harnessing multiple central processing units, and ranking tools help interpreters sift through hundreds of plausible realizations looking for the most probable, Al-Yahya continues.
“Interpreters focus their energies on adding insight to the process and make adjustments to the initial automated ranking. In this way, technology and interpreter skills are both optimized, leading to improved reservoir characterization,” he concludes.
Software As A Service
Lower computing infrastructure costs enables operators to measure well performance and manage facilities more efficiently, says Oscar Teoh, vice president of operations at iStore. At the same time, easy-to-adopt-and-use devices have become ubiquitous, and users are accustomed to accessing applications using them. This combination of new measures and new access technologies has led to the desire for software as a service (SaaS) applications, he adds. SaaS apps are available over the Internet and simplify the process of distributing access to data.
This new generation of applications fosters collaboration, putting people literally on the same page for tasks ranging from monitoring well performance to forecasting and economics. “Every aspect of operations can be improved with greater access by people in the field and head office,” Teoh says. “Another side of this is the crew change we have been going through,” he adds. “We need to build a wider network of collaboration to keep the expertise available.”
One of the key concepts of SaaS is that it brings the work to people, not the people to work. “When you have this efficiency, the return on investment is high because you do not need a full-time expert. Instead, you have people that you can federate as needed,” explains Teoh.
SaaS also enables users to choose the tool appropriate to them. “Tablets, desktops and collaborative spaces are simply tools that can be used for the right occasion,” he says. “What used to be available on specialized systems is now available on common devices such as smart phones. For example, 3-D images that used to cost millions and require immersive visualization rooms now are available easily through the Internet on affordable platforms that enable users to easily interact with and manipulate subsurface views, such as producing formations and wellbore locations.”
Using software as a service applications, even complex 3-D images are available through the Internet on affordable platforms that let users easily interact with and manipulate subsurface views, such as producing formations and wellbore locations. Shown here is a Web-based 3-D visualization of multilateral wellbores on a seismic horizon structure map.
The best collaborative tools foster and support two-way interaction, where users can touch and move, poke and point, and change data, says Teoh. Optimization in the application enables this interactivity by smartly caching data on the device and selectively transmitting data. Individual workspaces allow users to create and share their own views and edits without affecting the master version.
Standardization and data governance are the underpinnings of effective collaboration. Enforcing rules of ownership and validating data sources are essential to ensuring that the right information is accessed by the right users. Data management is a journey, not a destination, says Teoh. SaaS applications harness the power of the Internet using Web and data services to connect distinct and different databases collected for specific purposes.
“Using web technology, supervisory control and data acquisition data, production data, regulatory reporting data and other data sources can be brought together in a collaborative space for strategic and tactical decision making,” Teoh remarks. “SAAS applications tend to focus on the essentials, avoiding feature overload, proving a more efficient and reliable solution.”
There are two critical factors for efficient HPC seismic processing, according to Charles Sicking, Global Geophysical’s vice president of research and development. The first is turnaround time. In a business where time is literally money, he says operators place a premium on the speed as well as the accuracy of processed results. And that leads to the second factor: quality.
“Quality increases dramatically when clients participate earlier and more often in the processing,” says Sicking. “With faster turnaround times, it becomes reasonable to increase the number of quality reviews. Quality goes sky high when clients get to look at the data in different ways and do more tests over the course of a project.”
Massive parallelization has significantly improved both of these factors, according to Sicking. Parallelization enables simultaneous multinode computations and data access to make processes extremely efficient and save weeks in turnaround time. He says that highly parallelized disk systems enable two kinds of parallelism schemes for seismic processing.
The simplest is course-grain parallelization, whereby each CPU on each node runs the same software application against different parts of the data. In this method, there is no intercommunication between the CPUs, and they do not share memory or compute power. A dataset split across 1,000 CPUs can be processed 1,000 times faster, calculates Sicking.
The second kind is fine-grained parallelism, in which one application runs on a node with multiple CPUs. The application processes one piece of the data using all the CPUs on one node simultaneously. This capability is used extensively for computationally-intensive processes such as reverse-time migration, he notes.
Both kinds of parallelization can be combined by putting a course-grained wrapper around a fine-grained application, Sicking says. Then, for example, a seismic volume containing 50,000 shots can run on 100 nodes with each node processing 500 shots in parallel.
Super highly parallelized disk systems are key to effective parallelization, according to Sicking. Disk storage systems have inherent physical limitations on the speed of data access. “To bypass this limitation, highly parallelized disk systems have many blades with trays holding disks,” he explains. “Each blade has a computer, and all blades communicate and interface with the dataset, which is distributed across hundreds of hard drives. Requests for data are executed in a way that increases disk input/output up to 1,000 times compared with the serial access on single hard drives.”
Data access is fast enough that even datasets with many terabytes can be accessed efficiently, he notes. “When we changed the parallelization of our ambient seismic processing algorithm, the run time went from 2,100 down to 40 equivalent node days on the first large dataset,” Sicking reports. “That huge improvement dramatically shortened turnaround time.”
As another example, Global Geophysical’s seismic imaging application for horizontal transverse isotropy scanning requires very large compute resources, says Sicking. “Our system application uses parallelization to break the computation into small pieces, allowing hundreds of segments to run in parallel. Using this method, many parallel jobs can run simultaneously on hundreds of nodes, allowing for the timely delivery of advanced processing products such as inversion ready gathers,” Sicking says.
The third form of parallelization is to have the entire dataset loaded into memory on many nodes and use all of the CPUs of all nodes to process that dataset. “This method is very useful for transposing multidimensional datasets to change the framework of the data structure. To run effectively, the entire dataset must be accessible simultaneously,” says Sicking. “In a parallelized system, the algorithm shuffles the data until they are completely transposed in memory, and then outputs to the disk system with the new data structure,” he concludes.
Big Data Analytics
“The oil and gas industry is working hard to catch up to the advances in information technology,” says Scott Oelfke, product manager at LMKR, who notes that big data analytics already are being used successfully in financial, manufacturing and retail. One area where Oelfke says he sees some early experimentation with big data technology is in production optimization in unconventional reservoirs.
“With tools such as the open-source Hadoop and SAP’s in-memory HANA platform, the technology exists to leverage big data analytics. If upstream operators can figure out the right questions to ask and what datasets to use, they can get more value from their geological and geophysical data.”
Another area where Oelfke says he sees advancement is managing large data volumes on corporate networks. That is where advanced seismic attribute tools come in, generating high-quality attributes out of huge 3-D volumes, says Oelfke.
“In the past, this process was very time consuming. Today, attributes can be generated using the graphics processing unit and previewed in real time to let interpreters key in on exactly the attribute of interest. The volume can be generated immediately,” he elaborates. “Instead of taking two or three days to generate 12-15 volumes for review, only one volume is created and the process completes in an hour or sooner.”
The processing power in this scenario comes from gaming technology. High-end visualization is cheaper than ever, commoditized by the gaming industry. “Thanks to the power of the GPU, processing and visualizing complex subsurface geology is very fast,” Oelfke states.
To illustrate the sheer volume of data that interpreters must contend with, consider the typical number of wells in a project. “Twenty years ago, 500 wells in a project was a lot of wells, but 500,000 wells are not uncommon today,” says Oelfke. “The scale of these plays is creating huge volumes of data.”
Geosteering is another area benefiting from emerging Web technologies such as HTML5 (the fifth revision of the hypertext markup language standard), and the open-source Angular Web application framework, Oelfke points out. “Moving geosteering to the Web lets operators steer wells anywhere, anytime, 24 hours a day, seven days a week,” he says. “A Web-based tool gives geoscientists the flexibility to get their work done in the office, at home or on the road. It gives these folks their lives back.”
Internet Of Things
Various technologies are converging in ways that result in massive quantities of data being generated in most industries today, but the oil and gas industry has a unique challenge with the types of data being collected as well as the quantity of data, says Felix Balderas, director of technology and product development at Geophysical Insights.
“We need to have the tools to analyze multivariate data because traditional tools were not designed for what is happening with data today,” he remarks. “From upstream to downstream, we are seeing an increased use of data-generating sensors and other devices.”
These devices often are equipped with flash drives, making them more rugged and giving them more storage capability, and faster acquisition and transmission rates, and they are interconnected, Balderas points out.
“This and other increased capacities have produced larger data volumes than we have seen in the past,” he says, adding that the emerging Internet of Things (IOT) opens the possibility for tracking data from all aspects of an operation in real time. “This could provide valuable insights, if the proper tools are available to exploit this information.”
In the seismic acquisition sector, massive volumes of data are generated to create datasets with sizes in terms of terabytes and petabytes, Balderas notes. “These must be analyzed by interpreters, but many of the tools interpreters use were developed when a dataset measured in gigabytes was considered big,” he says. “Fortunately, desktop workstations are keeping pace with performance requirements in most cases, but the challenge continues of how to extract knowledge in a manner that is efficient and effective, given the quantity of data now available.”
Geophysical Insights’ Paradise multiattribute analysis software uses learning machine technology to extract more information from seismic data than is possible using traditional interpretation tools because it learns the data at full seismic resolution
Among the potential solutions are analytical and statistical techniques that cross-correlate apparently disparate data types to find previously unseen relationships that can help optimize dataset selections, such as seismic attributes, and find patterns that reduce the time to identify strategically important geological areas of interest.
“Traditionally, interpreters looked for geological patterns as much visually as numerically, manually picking points to identify geological features. This was a slow and error-prone technique that introduced human bias. The solutions we are developing are based on learning machine (LM) technology,” Balderas says. “Paradise®, the multiattribute analysis software that applies LM technology, extracts more information from seismic data than is possible using traditional interpretation tools because it learns the data at full seismic resolution. And, unlike human interpreters, Paradise is not limited to viewing only two or three attributes at a time.”
What makes LM algorithms different from imperative programming algorithms is that LM can learn from the data, rather than following a set of predefined instructions. Driverless cars, for example, must be able to recognize any stoplight encountered on a route. “There is no way to describe, using instructions, every possible intersection and stoplight configuration,” Balderas explains. “Sooner or later, the car will encounter a stoplight it has not seen before. With LM algorithms, the car will recognize a pattern and adjust what it knows about stoplights for future reference.”
A similar process of pattern recognition and machine learning techniques can shorten the time for extracting knowledge from geophysical data, he contends. “Applied to a volume of geophysical data, the algorithm looks for patterns that reveal geological features, which is essentially what interpreters do,” notes Balderas.
He adds that the speed of pattern recognition is crucial to generating value. “Learning machines can quickly locate faults, horizons and other geological features for the interpreter to review,” Balderas states. “There is no technological substitute for an experienced interpreter, but this ‘candidate feature’ finding approach helps the interpreter focus his work on areas with the greatest potential.”