NEW e-Course by Dr. Tom Smith: Machine Learning Essentials for Seismic Interpretation

NEW e-Course by Dr. Tom Smith: Machine Learning Essentials for Seismic Interpretation

Machine learning is foundational to the digital transformation of the oil & gas industry and will have a dramatic impact on the exploration and production of hydrocarbons.  Dr. Tom Smith, the founder and CEO of Geophysical Insights, conducts a comprehensive survey of machine learning technology and its applications in this 24-part series.  The course will benefit geoscientists, engineers, and data analysts at all experience levels, from data analysts who want to better understand applications of machine learning to geoscience, to senior geophysicists with deep experience in the field.

Aspects of supervised learning, unsupervised learning, classification and reclassification are introduced to illustrate how they work on seismic data.  Machine learning is presented, not as an end-all-be-all, but as a new set of tools which enables interpretation on seismic data on a new, higher level that of abstraction  that promises to reduce risks and identify features that which might otherwise be missed.

The following major topics are covered:

  • Operation  – supervised and unsupervised learning; buzzwords; examples
  • Foundation  – seismic processing for ML; attribute selection list objectives; principal component analysis
  • Practice  – geobodies; below-tuning; fluid contacts; making predictions
  • Prediction – the best well; the best seismic processing; over-fitting; cross-validation; who makes the best predictions?

This course can be taken for certification, or for informational purposes only (without certification). 

Enroll today for this valuable e-course from Geophysical Insights!

Video: Leveraging Deep Learning in Extracting Features of Interest from Seismic Data

Video: Leveraging Deep Learning in Extracting Features of Interest from Seismic Data



Mapping and extracting features of interest is one of the most important objectives in seismic data interpretation. Due to the complexity of seismic data, geologic features identified by interpreters on seismic data using visualization techniques are often challenging to extract. With the rapid development in GPU computing power and the success obtained in computer vision, deep learning techniques, represented by convolutional neural networks (CNN), start to entice seismic interpreters in various applications. The main advantages of CNN over other supervised machine learning methods are its spatial awareness and automatic attribute extraction. The high flexibility in CNN architecture enables researchers to design different CNN models to identify different features of interest. In this webinar, using several seismic surveys acquired from different regions, I will discuss three CNN applications in seismic interpretation: seismic facies classification, fault detection, and channel extraction. Seismic facies classification aims at classifying seismic data into several user-defined, distinct facies of interest. Conventional machine learning methods often produce a highly fragmented facies classification result, which requires a considerable amount of post-editing before it can be used as geobodies. In the first application, I will demonstrate that a properly built CNN model can generate seismic facies with higher purity and continuity. In the second application, compared with traditional seismic attributes, I deploy a CNN model built for fault detection which provides smooth fault images and robust noise degradation. The third application demonstrates the effectiveness of extracting large scale channels using CNN. These examples demonstrate that CNN models are capable of capturing the complex reflection patterns in seismic data, providing clean images of geologic features of interest, while also carrying a low computational cost.

To view this webinar in Chinese, please click here.

Transcript of the Webinar

Hal Green: Good morning and Buenos Dias to our friends in Latin America that are joining us. This webinar is serving folks in the North America and Latin America regions, and there may be others worldwide who are joining as well. This is Hal Green and I manage marketing at Geophysical Insights. We are delighted to welcome you to our continuing webinar series that highlights applications of machine learning to interpretation.

We also welcome Dr. Tao Zhao, our featured speaker, who will present on leveraging deep learning in extracting features of interest from seismic data. Dr. Zhao and I are in the offices of Geophysical Insights in Houston, Texas, along with Laura Cuttill, who is helping at the controls.

Now just for a few comments about our featured speaker today. Dr. Tao Zhao joined Geophysical Insights 2017 as a research geophysicist where he develops and applies shallow and deep learning techniques in seismic and well log data, and advances multi-attribute seismic interpretation workflows. He received a Bachelor of Science in Exploration Geophysics from the China University of Petroleum in 2011, a Master of Science in Geophysics from the University of Tulsa in 2013, and a Ph.D. in Geophysics from the University of Oklahoma in 2017.  During his Ph.D. work at the University of Oklahoma, Dr. Zhao worked in the attribute assisted seismic processing and interpretation or AASPI Consortium developing pattern recognition and seismic attribute algorithms and he continues to participate actively in the AASPI Consortium but now as a rep of Geophysical Insights, and we’re delighted to have Tao join us today. At this point, we’re going to turn over the presentation to Tao and get started.

Tao Zhao: Hello to everyone, and thank you Hal for the introduction. As you can see by the title, today we will be talking about how we can use deep learning in the application of seismic data interpretation. We’re focusing on extracting different features of interest from seismic data. Here is a quick outline of today’s presentation. First, I will pose a question to you about how we see a feature on the seismic data versus how a machine or computer sees a feature on the seismic data to link to the reason why we want to use deep learning. Then there will be a very interesting story, or argument, behind the shallow learning versus the deep learning method. Today we’re only focusing on deep learning, so shallow learning is another main topic that people work on to be applied to seismic interpretation. Then there are three main applications I will talk about today. The first one is seismic facies classification, the second is fault detection, and the last one is channel extraction. Finally, there will be some conclusions to go with this presentation.

The first thing I want to talk about is actually a question. What does a human interpreter see versus what does a computer see on a seismic image. Here we have an example from offshore New Zealand Taranaki Basin. On the left, you have a vertical seismic line of seismic amplitude and on the right you have a time slice for coherence attribute, just to show you how complex the geology is in this region. As human interpreters, we have no problem to see features of different scales from this seismic image. For example, we have some geophysical phenomenon here. Those are multiples and those are features that we see from seismic data that may not relate to a specific geological feature. But here we have something that has a specific geologic meaning.

Here we have some stacked channels, and here we have a volcano, here we have tilted fault blocks, and here we have very well defined continuous layered sediments. So, as human beings, we have no problem to identify those features because we have very good cameras and very good processors. The cameras are, of course, our eyes which help us to identify those features in a very local scale or a tiny scale, as well as to capture the features in a very large scale, such as the whole package of the stacked channel. On the other hand we have a good processor, which is our brain. Our brain can understand that and can put the information together from both the local patterns and the very large scale patterns. But for a computer, there’s a problem because, by default, the computer only sees pixels from this image which means the computer only understands the intensity at each pixel.

For the computer to understand this image, we typically need to provide a suite of attributes for the computer to work on. For example, in the stacked channel we have several attributes that can quantify the stacked channel facies. But those attributes are not perfectly aligning in the same region. In this particular region, we have reflectors that are converging into each other so an attribute that quantifies the convergence of the reflectors may best quantify this kind of feature in this local scale. Here we have discontinuity within these reflectors, so some coherence type of attribute may best quantify this feature. Here we have a continuous layer gently dipping, so maybe it’s just as simple as the dip of the reflectors quantifying this local scale feature. Finally, we have some very weak amplitude here, so we can just use the amplitude to quantify here.

So, in short, each of the attributes may qualify a particular region within the big geologic facies, but not everywhere. Then the problem we have is “how can we quantify the complex patterns in this relatively big window as one uniform facies?” The end goal is us wanting to have something like that. We want to color code the same seismic facies into the same color – into one uniform color represented by one uniform value. For example, here we have all those facies color-coded and all those transparent regions mean we don’t have a facies assigned. Or, actually, we have a facies assigned and we can call these facies zero – representing everything else. Although this result is from an interpreter’s manual pick, I will show you in the first application that after we train a computer, or after we train a deep learning model that a computer can do a pretty good job to mimic those picks on other seismic slices.

Before I go into the actual application let me share with you a very interesting story. Here’s a bet between shallow learning and deep learning. The bet happened in 1995, March 14, almost 24 years ago. The bet was between two very famous figures – on the left is Vladimir Vapnik who is the inventor of support vector machines. On the right is his boss at Bell Labs at the time. So one day Larry Jackel bet that after a few years people will have a good understand of the big neural networks -or the deep neural networks – and people will start using those deep neural networks with great success. As a counter-bet, Vapnik thought that even after 10 years, people will still be having trouble with those big neural networks and people may not use those anymore. They will turn to kernel methods such as support vector machines which are shallow learning methods. They bet on a very fancy dinner.

After 5 years, it came out that both of them were wrong, because by the year 2000, people were still having trouble using big neural networks. However, people were not dumping the neural network at all. They were still using them, so Vapnik lost as well because people are still using neural networks. In fact, after 10 years from that time, into the early 2010’s, people were starting to use very large, very deep neural networks such as convolutional neural networks. So as a result both of them lost the bet and they had to split the dinner, and guess who had a free dinner? It’s Yann LeCun, who happens to be the witness at their bet. So as we know, Yann LeCun is one of the founders or inventors of the convolutional neural network that we use today. So then there’s a question about what is shallow learning and what is deep learning. Well, different people may have different answers but for me, I think the main distinction between shallow learning and deep learning is if an algorithm learning from the features provided by the user or the algorithm learning the features by itself. So if we’re using a shallow learning method such as a typical neural network – a multi-layer perceptron neural network – the first step to do if we apply the neural network on seismic data is to extract seismic attributes and let the neural network classify on those attributes. So that means the algorithm needs to learn from the features- and here features means seismic attributes. On the other hand if we’re using a deep-learning method to classify on seismic data, typically we will provide the raw input which is the seismic amplitude. Some people may use even pre-stack seismic amplitude and here we’re just using post-stack. It’s still a relatively raw data compared to the seismic attributes that we calculated from the seismic amplitude. So during the training process of a deep learning method it will automatically derive a great number of – I will call attributes – from your input data and find the best ones that represent the data so that your target classes can be well separated. For example here we have two seismic facies, one is a stack channel the other is a tilted fault block, and if we want to separate those two features using a shallow learning method, this is what we’re going to do. We have to choose a bunch of seismic attributes that best distinguishes those two facies. Maybe we can use discontinuity or dip magnitude or amplitude variance or even reflector convergence. But the problem is, even if we use all those attributes, we probably won’t have a perfect separation between those two facies because those patterns are so complex. At every region they have different responses from a particular attribute. But don’t get me wrong, I’m not saying that those attributes are useless, that we don’t need to use AAPSI attributes at all. Seismic attributes are very useful to quantify local properties and they’re very useful for visualization purposes because once we have the seismic attributes it’s very easy for us as human beings to identify the features. It just becomes difficult for a computer to use that information and separate those very complex patterns from the others.

So what are we going to do? How do you quantitatively describe the difference between those two very complex facies? Well the answer is ‘let the machine tell us.’ If we are using a deep learning method, and here for example I’m showing a very simple deep learning model and people call it an encoder-decoder model, then we just feed in this seismic amplitude data and after training, we will have a classified seismic facies just like I showed at the very beginning that color-code different facies to a single color and according to what training data we provide. For example, here it classifies this stacked channel into one uniform color or a single value and here the tilted fault blocks in another value. So deep learning automatically learns the most suitable attributes to use although those attributes most likely won’t make any sense for human beings if we just look at those attributes but the computer with the algorithm can figure out the difference or use those attribute to separate your target facies.

Let’s look at the first application. The first application is on seismic facies classification on this data set I used in the introduction. This, being a testing data set, is relatively small or about one gig in size. Because I did the study almost a year ago, at that time it took me about 90 minutes to run the training on a not that powerful single GPU. Right now with the growing computing powers and with the better scripting and better software libraries we can do it much faster. So there are 31 lines manually annotated from the seismic volume and those are used for training and validation. For example we can uniformly interpret or annotate some of the lines in this volume and those are those annotated line and we can train the model using, I’d say, 29 lines and testing your results on two lines that not used in training, and we can also do a cross- validation which means first time we choose this 29 lines and test on the other two lines and the next time we train on a different set of training data – a different 29 lines and testing on the other two remaining lines. After several rounds, if we have a relatively stable result, then we know that our model parameters are pretty good to use and the result is pretty reliable.

So here is the result on the training data, after we run the training. This is a line that used in training and this is the manually interpreted result. As you can see, there are several seismic facies, and whether it’s geologically meaningful or it represents a particular geophysical phenomenon those facies are being picked out by hand and after we run the training, we first of all want to test the neural network on the training data to see if the network is converged well and behaves well on the training data. This is the result on the same training line, as you can see that if I flip back and forth, back and forth, as you can see those two images match very, very good, which is good but it’s not that interesting because this line is used in training and what we really want to see is how the neural network performs on a testing line that is not used for training. So this is a line that is not used in training and this is, again, this is a hand-picked result. We consider this a ground truth and this is the predicted result on the same line. So this gives you an idea of how this network performs on data that it hasn’t seen before. To measure the performance, or measure the quality of the prediction we have several options in terms of the performance metric. The most commonly used one may be the sample accuracy, which means how many samples are correctly predicted and here’s it’s 93% of samples are correctly predicted but in this case, this metric is okay to use because we don’t have a huge imbalance among all those classes. In some cases, particularly if we’re looking for a feature that only consists of a fraction of samples in your data set, such as if you want to pick out only faults, then this metric is very misleading. Let’s say if you only have 0.1% of your data as faults in your data set, even if you don’t predict anything for the fault and get all the remaining – and predict every sample as non-fault, you still have a 99.9% correctness or accuracy. But, in fact, you have nothing predicted for the fault. So a more robust metric to use is the intersection over union, which is defined like this: for a particular class, I’d say for the stacked channel complex, we take the ground truth of the stacked channel complex which is found outlined by this region and the intersection between the ground truth and this predicted result, then divided by the overall extension of your ground truth and predicted result. So basically it calculated how much overlap you have between your prediction and your ground truth. And this is only for one class. If you have more than one class – in this example we have 9 different facies – we will average over this measurement over all your classes, so that it essentially takes out the imbalanced data problem because each class, no matter how many samples there are in the class, they contribute equally in your final metric.

For example, if you have a class with only ten samples, and the other class 1,000 samples, each of those will contribute .5 in your final measurement. So if you have a 10 sample class, you only have 2 samples correctly predicted, then you have a IoU measure for that particular class at .2 and for that 1,000 sample class, if you predict all of the samples correctly, you have IoU measure of 1 for that particular class, then you average over you only have .6. But if you are using sample accuracy, you will have accuracy close to 1, which is not a good estimate of the real performance of your model. So this is the first testing line, and here’s another testing line. Again, this is a ground truth that was manually picked from an interpreter’s manual interpretation. This is the predicted result. As you can see, those two images again match pretty well, and the main thing to notice is although the boundaries aren’t matching perfectly we have a very clean body within each of the fissures which makes the subsequent steps such as generating geobodies much, much easier. Also for this particular case, for this predicted result it matches the reflector pretty well. So I think we’re happy with this result and we can visualize it in 3D. So here we have an inline and a cross line of seismic amplitude, we can overlap our prediction on those two lines so actually we have a volumetric prediction everywhere which shows two lines . This display is very useful for interpreters because it gives you all the highlights, or all the regions of interest that the interpreter may be interested in, although those regions are not 100% accurate, it’s very easy to find your interest with this color-coded map, instead of just scanning through line by line without any highlights. And again we can visualize those features in 3D as some sort of geobody and as you can see here we have very nice well-defined gas chimney and all the other facies as well. And you can crop it to whatever display you would prefer.

So the second application I want to discuss today is fault detection. Here is a data set we want to test our fault detection. Again this data set is from offshore New Zealand, but from a different basin. So here we’re in Great South Basin. Typical workflow with fault detection will start with some sort of edge detection attribute, for example coherence. So here we have a coherence co-blended with this seismic amplitude, and as you can see that the coherence does a pretty good job to highlighting those faults, but there’s some problem with this coherence attributes because the coherence being an edge-detection algorithm, it detects all the discontinuities in this data set. For example here we have a bunch of very high coherence anomalies but, in this particular example, it’s very low in value coherence anomalies and those are not faults. So people call those things syneresis, which means they are cracks formed in the shaly formation when the formation lost water. Another problem, I will say, for this image is if we take a very close look at this coherence image we see some stair-step artifacts which means the coherence anomaly along the fault surface is not smooth. Instead it’s highly segmented. It’s related to some algorithm limitations in this kind of algorithm because it uses a vertical window. And so to get a better fault detection result or to get a better initial attribute that we can run our fault surface generation algorithms on, we need some studies using a neural network. So this is the result from convolutional neural network fault detection. As you can see here have very, very smooth fault and with almost no noise at all from other types of discontinuities. Let me flip back to this coherence and let’s go to our CNN. So you can see that we have a very well defined fault with almost no noise at all. So this is a very promising result. So how do we get this result? You may ask this question. There are several ways that we can do fault detection using CNN, and some of those are very easy to implement, and some of those may require some well-designed algorithm.

So let’s start with something basic. The most basic, or most naive way, to implement this CNN based fault detection is to do something like that. We define this problem as a classification problem and we pick some training data to represent faults and another class of training data to represent non-faults. So here all the green lines are the training data, training samples picked to represent faults and the red triangles are what we picked to represent non-faults. I picked things on 5 lines. Here is the coherence image just to show you how the faults look like on the seismic data. Being a naive implementation, the algorithm we use is something like that. So for every sample, we extract a small 3D patch around the sample and we classify the small 3D patch to be whether it’s a fault or non-fault and assign this value to the center point. In this way, we can after we train this model, we can classify all the samples in the seismic volume one-by-one by using a sliding window of the size of this 3D patch.

This is the result of the naive implementation. I will say this result is kind of ugly because the faults are relatively thick and they’re not that continuous, we have a lot of noise here as well. At that time we were thinking to clean it up using some kind of image processing techniques. So we took this result, and went through an image processing workflow, and I call it regularization which is nothing but smoothing along the fault and sharpening across the fault. After this regularization, we have a result like that to compare against our raw CNN output. As you can see we have much sharper fault images and it cleaned up those noise quite well.

At this time you may ask who actually does the heavy lifting? Is that the CNN or is it actually this image processing regularization? To answer this question, we brought in our coherence image and went through exactly the same regularization or image processing workflow and this is what we got. So to compare this one with the one we got from using the CNN fault detection as our raw output, it’s pretty clear that once we use the CNN as our initial fault detection attribute we get rid of those type of noise. Moreover we have more continuous faults as well, compared to using this coherence.

To do a further comparison, we also did a fault detection using a swarm intelligence workflow from a 3rd party vendor that I cannot tell the name. So this is the result from a swarm intelligence fault detection and as you can see it brings out most of the faults pretty well but the problem is it may be too sensitive to the discontinuities and you have those responses almost everywhere in your data set. And you have – maybe there – those things are actually acquisition footprints or maybe some sort of noise. So if we use this example, sorry, this result to do a fault surface generation, you may have a bunch of fault surfaces that are not real. Then we zoom into this box region and on the left you have CNN based fault detection and on the right you have coherence-base fault detection and it’s pretty clear that we don’t have that kind of noise on here and also the kind of faults are very, very continuous and clean. Again, here is swarm intelligence and we have a bunch of noise here which may not be the real fault. And then we can view it in a vertical slice. Here is a coherence based and here is a CNN based. So coherence based and CNN based. And it’s pretty clear that the coherence based result, even though we went through this regularization step the fault surfaces are not that continuous and we have a bunch of other types of discontinuity response as well. But for CNN, it got rid of most of those and the faults are very continuous and sharp. But then you may identify a problem, so this result is not as good as the one I showed at the very beginning. So what’s the problem? There’s lots of faults that are missing, but in general it’s just not that good. As I said, this is a very naive implementation. At the time that we developed this, we thought maybe we can use a similar approach as we did with the seismic facies classification and use that type of CNN network and this is what we got. This result is the same as the one I showed at the very beginning, the only difference is here I used this image processing regularization to make those faults a little thinner. Again, we can use different types of neural networks, and this is just one possible way to do it. This is similar to the one we can use for seismic facies classification. We take a whole seismic line, if your data is relatively small, or we can take a large 2D patch of data, say, maybe 200 samples by 200 samples in size, and fit it into the network and we get a classification at every sample in this patch simultaneously. Once we move to this kind of algorithm, we will have a much better defined fault image. We can then make the faults even thinner by using some morphological thinning, which is just skeletonization, to make everything one sample thick on a bigger line. To look from a time slice, this is the naive implementation, which I called a 3D patch-based classification and this is the segmentation. This segmentation network only runs on 2D, which means it takes 2D image for training, and then you can run your prediction on both inline and crossline and sum it together. So it’s pseudo-3D, but it’s actually 2D. Even if it’s in 2D, it still gives you a better result compared to the 3D patch-based method. As you can see, it has much more faults and more continuous faults.

After that, we thought this result is okay, but we’re only using 2D information. We train it on 2D lines. So can we use 3D information and train a 3D CNN? Well, the answer is yes. This is the result we got from training a real 3D neural network on the same data set. To compare this result with this line-based CNN, I will say that it gives about the same faults. In general the faults are cleaner and the faults are more continuous, but moreover the biggest advantage of this 3D CNN fault detection neural network is to train this network I don’t need to use and prior knowledge for this particular data set I want to predict. This network is trained on the data that has not been seen on this data set. In other words, this network is very general so that we can apply this trained neural network to many seismic data sets as long as those are the – the data quality are relatively the same or I mean, it allows a certain variation in the data set but if you have, let’s say if you have a marine data, if you’re training on marine data and predicting on land data, it’s probably not going to work. We still have some limitation on data quality but in general it works very, very well and if you take the training time away from the users, so when the users use this type of technique to predict a fault, it becomes very, very fast. So for example, in this particular data set, this data set is maybe one gigabyte, maybe 900 megabytes or so, and only takes about a minute or even less than a minute to run on a single GPU.

Another thing that we tried is to run a fault detection using some input data other than seismic amplitude. In this particular example, I tried to run a fault detecting using a self-organizing map classification result. The reason behind it is some attributes, for example, instantaneous phase attribute, or cosine of instantaneous phase, are very, very good to make your reflector very continuous. It’s so continuous in a way that it highlights this discontinuity between your continuous reflectors. If we use those type of attributes and run a SOM and get a classification result on those type of attributes for example like that, then we have a pretty clear, well-defined faults as well. I trained my neural network on this image and hoped to see that they can pick the faults as good as on seismic amplitude data, or maybe even better. This is the result we get from training a fault-detection neural on the SOM classification result. As you can see, it picks out the faults really, really well and we may have a little bit missing faults in here, but in general, it gives you a very good result on fault detection, even though we’re not using the seismic amplitude data.

At the very beginning, I was hoping maybe this result won’t be as good as using the seismic amplitude because we’re limiting the information that we provide to the neural network, but in fact it just did a pretty good job. Maybe the reason behind it is we carefully chose the attribute that’s very good to highlight- or we’re not actually highlighting the fault, but it’s highlighting the continuous reflectors so that the faults stand out on those attributes.

Again, we can show this same result on this seismic amplitude. This is how they look like on the seismic amplitude. Let me flip back to the SOM and to the seismic amplitude. As you can see, those faults are very well defined.

Okay, the last application is for channel extraction. This channel extraction is actually similar to what we did for the seismic facies, but in this case we’re interested in a particular feature. So this data set is again, from offshore New Zealand Taranaki Basin but it’s a different survey compared to the very first one that I showed. So here in the survey we have a lot of channels and most of the channels in this part are relatively small scale, and here we have a very big channel in the shallower part. In this particular example I’m interested in extracting this big channel from this data set. As you know, extracting those smaller-scale channels are actually easier than the big ones because for smaller channels you can use the coherence response from the channel flanks or the channel edges. Maybe the curvature response from the bottom of the channel. Those responses are kind of overlapped with where the channel is. Or you can just extract the channel body using those attributes. But the problem with the big channel is, for the big channel those attributes are only sensitive to the edges of the channel or to the bottom of the channel, so it’s only sensitive to part of the channel but not the whole channel. That makes extracting the whole channel somewhat challenging. In this example, I manually highlighted this channel on several lines of the data and then after training the network I extracted the channel over the whole volume and this is what I got. This channel matches the boundary on the seismic pretty well. While we may have some disagreement in here, in general, I think in terms of getting a quick interpretation of this channel, I think the neural network does a pretty good job. We can, of course, see it in a 3D, and here on the lower right corner, we have the channel displayed on each of the time level and grow from the bottom to the top. Every time slice the channel matches the boundary on the seismic data pretty well. Then I looked into another channel in the deeper part of the same survey, so here we’re about 2 seconds, the previous one about .7 I think. Here we have a more sinuous channel in the deeper part of the survey, and again I picked some of the appearance of the channel. After training the network, we were able to match the channel boundary pretty well using this CNN classification. And of course we again, have some noise that leaked outside of the channel but I’m not too worried about that because this is the raw input, raw output from the neural network without any post-editing. So again, we can visualize it in 3D, so the channel develops from this side and once we go up, we can see this sinuous channel start to show up in this side and it matches the boundary pretty well. And of course, again we have some noise and those things can be cleaned up by some post-editing techniques.

So conclusions. I think after I showed the 3 applications it’s safe to say that deep-learning methods represented by convolutional neural networks are powerful in qualifying complex seismic reflection patterns into uniform facies, whether we’re interested in multiple facies at the same time or we’re interested in a particular feature of interest such as faults or channels. And we demonstrated the application for the 3 problems with clear success, and finally I think that with great flexibility in model architecture with different types of CNNs and with all your clever genius researchers we can develop something particular for a particular problem so that we believe that CNNs are promising in other interpretation tasks as well. I would like to thank Geophysical Insights for permission to show this work, and also want to thank New Zealand Petroleum and Minerals for providing those beautiful data sets used in the study to the general public.

Tao Zhao

Research Geophysicist | Geophysical Insights

TAO ZHAO joined Geophysical Insights in 2017. As a Research Geophysicist, Dr. Zhao develops and applies shallow and deep machine learning techniques on seismic and well log data, and advances multiattribute seismic interpretation workflows. He received a B.S. in Exploration Geophysics from the China University of Petroleum in 2011, an M.S. in Geophysics from the University of Tulsa in 2013, and a Ph.D. in geophysics from the University of Oklahoma in 2017. During his Ph.D. work at the University of Oklahoma, Dr. Zhao was an active member of the Attribute-Assisted Seismic Processing and Interpretation (AASPI) Consortium developing pattern recognition and seismic attribute algorithms.

Seismic Facies Classification Using Deep Convolutional Neural Networks

Seismic Facies Classification Using Deep Convolutional Neural Networks

By Tao Zhao
Published with permission: SEG International Exposition and 88th Annual Meeting
October 2018


Convolutional neural networks (CNNs) is a type of supervised learning technique that can be directly applied to amplitude data for seismic data classification. The high flexibility in CNN architecture enables researchers to design different models for specific problems. In this study, I introduce an encoder-decoder CNN model for seismic facies classification, which classifies all samples in a seismic line simultaneously and provides superior seismic facies quality comparing to the traditional patch-based CNN methods. I compare the encoder-decoder model with a traditional patch- based model to conclude the usability of both CNN architectures.


With the rapid development in GPU computing and success obtained in computer vision domain, deep learning techniques, represented by convolutional neural networks (CNNs), start to entice seismic interpreters in the application of supervised seismic facies classification. A comprehensive review of deep learning techniques is provided in LeCun et al. (2015). Although still in its infancy, CNN-based seismic classification is successfully applied on both prestack (Araya-Polo et al., 2017) and poststack (Waldeland and Solberg, 2017; Huang et al., 2017; Lewis and Vigh, 2017) data for fault and salt interpretation, identifying different wave characteristics (Serfaty et al., 2017), as well as estimating velocity models (Araya-Polo et al., 2018).

The main advantages of CNN over other supervised classification methods are its spatial awareness and automatic feature extraction. For image classification problems, other than using the intensity values at each pixel individually, CNN analyzes the patterns among pixels in an image, and automatically generates features (in seismic data, attributes) suitable for classification. Because seismic data are 3D tomographic images, we would expect CNN to be naturally adaptable to seismic data classification. However, there are some distinct characteristics in seismic classification that makes it more challenging than other image classification problems. Firstly, classical image classification aims at distinguishing different images, while seismic classification aims at distinguishing different geological objects within the same image. Therefore, from an image processing point of view, instead of classification, seismic classification is indeed a segmentation problem (partitioning an image into blocky pixel shapes with a coarser set of colors). Secondly, training data availability for seismic classification is much sparser comparing to classical

image classification problems, for which massive data are publicly available. Thirdly, in seismic data, all features are represented by different patterns of reflectors, and the boundaries between different features are rarely explicitly defined. In contrast, features in an image from computer artwork or photography are usually well-defined. Finally, because of the uncertainty in seismic data, and the nature of manual interpretation, the training data in seismic classification is always contaminated by noise.

To address the first challenge, until today, most, if not all, published studies on CNN-based seismic facies classification perform classification on small patches of data to infer the class label of the seismic sample at the patch center. In this fashion, seismic facies classification is done by traversing through patches centered at every sample in a seismic volume. An alternative approach, although less discussed, is to use CNN models designed for image segmentation tasks (Long et al., 2015; Badrinarayanan et al., 2017; Chen et al., 2018) to obtain sample-level labels in a 2D profile (e.g. an inline) simultaneously, then traversing through all 2D profiles in a volume.

In this study, I use an encoder-decoder CNN model as an implementation of the aforementioned second approach. I apply both the encoder-decoder model and patch-based model to seismic facies classification using data from the North Sea, with the objective of demonstrating the strengths and weaknesses of the two CNN models. I conclude that the encoder-decoder model provides much better classification quality, whereas the patch-based model is more flexible on training data, possibly making it easier to use in production.

The Two Convolutional Neural Networks (CNN) Models

Patch-based model

A basic patch-based model consists of several convolutional layers, pooling (downsampling) layers, and fully-connected layers. For an input image (for seismic data, amplitudes in a small 3D window), a CNN model first automatically extracts several high-level abstractions of the image (similar to seismic attributes) using the convolutional and pooling layers, then classifies the extracted attributes using the fully- connected layers, which are similar to traditional multilayer perceptron networks. The output from the network is a single value representing the facies label of the seismic sample at the center of the input patch. An example of patch-based model architecture is provided in Figure 1a. In this example, the network is employed to classify salt versus non-salt from seismic amplitude in the SEAM synthetic data (Fehler and Larner, 2008). One input instance is a small patch of data bounded by the red box, and the corresponding output is a class label for this whole patch, which is then assigned to the sample at the patch center. The sample marked as the red dot is classified as non-salt.

CNN architecture patch-based model

Figure 1. Sketches for CNN architecture of a) 2D patch-based model and b) encoder-decoder model. In the 2D patch-based model, each input data instance is a small 2D patch of seismic amplitude centered at the sample to be classified. The corresponding output is then a class label for the whole 2D patch (in this case, non-salt), which is usually assigned to the sample at the center. In the encoder-decoder model, each input data instance is a whole inline (or crossline/time slice) of seismic amplitude. The corresponding output is a whole line of class labels, so that each sample is assigned a label (in this case, some samples are salt and others are non-salt). Different types of layers are denoted in different colors, with layer types marked at their first appearance in the network. The size of the cuboids approximately represents the output size of each layer.

Encoder-decoder model

Encoder-decoder is a popular network structure for tackling image segmentation tasks. Encoder-decoder models share a similar idea, which is first extracting high level abstractions of input images using convolutional layers, then recovering sample-level class labels by “deconvolution” operations. Chen et al. (2018) introduce a current state-of-the-art encoder-decoder model while concisely reviewed some popular predecessors. An example of encoder-decoder model architecture is provided in Figure 1b. Similar to the patch-based example, this encoder-decoder network is employed to classify salt versus non-salt from seismic amplitude in the SEAM synthetic data. Unlike the patch- based network, in the encoder-decoder network, one input instance is a whole line of seismic amplitude, and the corresponding output is a whole line of class labels, which has the same dimension as the input data. In this case, all samples in the middle of the line are classified as salt (marked in red), and other samples are classified as non-salt (marked in white), with minimum error.

Application of the Two CNN Models

For demonstration purpose, I use the F3 seismic survey acquired in the North Sea, offshore Netherlands, which is freely accessible by the geoscience research community. In this study, I am interested to automatically extract seismic facies that have specific seismic amplitude patterns. To remove the potential disagreement on the geological meaning of the facies to extract, I name the facies purely based on their reflection characteristics. Table 1 provides a list of extracted facies. There are eight seismic facies with distinct amplitude patterns, another facies (“everything else”) is used for samples not belonging to the eight target facies.

Facies number Facies name
1 Varies amplitude steeply dipping
2 Random
3 Low coherence
4 Low amplitude deformed
5 Low amplitude dipping
6 High amplitude deformed
7 Moderate amplitude continuous
8 Chaotic
0 Everything else

To generate training data for the seismic facies listed above, different picking scenarios are employed to compensate for the different input data format required in the two CNN models (small 3D patches versus whole 2D lines). For the patch-based model, 3D patches of seismic amplitude data are extracted around seed points within some user-defined polygons. There are approximately 400,000 3D patches of size 65×65×65 generated for the patch-based model, which is a reasonable amount for seismic data of this size. Figure 2a shows an example line on which seed point locations are defined in the co-rendered polygons.

The encoder-decoder model requires much more effort for generating labeled data. I manually interpret the target facies on 40 inlines across the seismic survey and use these for building the network. Although the total number of seismic samples in 40 lines are enormous, the encoder-decoder model only considers them as 40 input instances, which in fact are of very small size for a CNN network. Figure 2b shows an interpreted line which is used in training the network

In both tests, I randomly use 90% of the generated training data to train the network and use the remaining 10% for testing. On an Nvidia Quadro M5000 GPU with 8GB memory, the patch-based model takes about 30 minutes to converge, whereas the encoder-decoder model needs about 500 minutes. Besides the faster training, the patch-based model also has a higher test accuracy at almost 100% (99.9988%, to be exact) versus 94.1% from the encoder- decoder model. However, this accuracy measurement is sometimes a bit misleading. For a patch-based model, when picking the training and testing data, interpreters usually pick the most representative samples of each facies for which they have the most confidence, resulting in high quality training (and testing) data that are less noisy, and most of the ambiguous samples which are challenging for the classifier are excluded from testing. In contrast, to use an encoder-decoder model, interpreters have to interpret all the target facies in a training line. For example, if the target is faults, one needs to pick all faults in a training line, otherwise unlabeled faults will be considered as “non-fault” and confuse the classifier. Therefore, interpreters have to make some not-so-confident interpretation when generating training and testing data. Figure 2c and 2d show seismic facies predicted from the two CNN models on the same line shown in Figure 2a and 2b. We observe better defined facies from the encoder-decoder model compared to the patch- based model.

Figure 3 shows prediction results from the two networks on a line away from the training lines, and Figure 4 shows prediction results from the two networks on a crossline. Similar to the prediction results on the training line, comparing to the patch-based model, the encoder-decoder model provides facies as cleaner geobodies that require much less post-editing for regional stratigraphic classification (Figure 5). This can be attributed to an encoder-decoder model that is able to capture the large scale spatial arrangement of facies, whereas the patch-based model only senses patterns in small 3D windows. To form such windows, the patch-based model also needs to pad or simply skip samples close to the edge of a 3D seismic volume. Moreover, although the training is much faster in a patch-based model, the prediction stage is very computationally intensive, because it processes data size N×N×N times of the original seismic volume (N is the patch size along each dimension). In this study, the patch-based method takes about 400 seconds to predict a line, comparing to less than 1 second required in the encoder-decoder model.


In this study, I compared two types of CNN models in the application of seismic facies classification. The more commonly used patch-based model requires much less effort in generating labeled data, but the classification result is suboptimal comparing to the encoder-decoder model, and the prediction stage can be very time consuming. The encoder-decoder model generates superior classification result at near real-time speed, at the expense of more tedious labeled data picking and longer training time.


The author thanks Geophysical Insights for the permission to publish this work. Thank dGB Earth Sciences for providing the F3 North Sea seismic data to the public, and ConocoPhillips for sharing the MalenoV project for public use, which was referenced when generating the training data. The CNN models discussed in this study are implemented in TensorFlow, an open source library from Google.

Figure 2. Example of seismic amplitude co-rendered with training data picked on inline 340 used for a) patch-based model and b) encoder-decoder model. The prediction result from c) patch-based model, and d) from the encoder-decoder model. Target facies are colored in colder to warmer colors in the order shown in Table 1. Compare Facies 5, 6 and 8.

Figure 3. Prediction results from the two networks on a line away from the training lines. a) Predicted facies from the patch-based model. b) Predicted facies from the encoder-decoder based model. Target facies are colored in colder to warmer colors in the order shown in Table 1. The yellow dotted line marks the location of the crossline shown in Figure 4. Compare Facies 1, 5 and 8.

Figure 4. Prediction results from the two networks on a crossline. a) Predicted facies from the patch-based model. b) Predicted facies from the encoder-decoder model. Target facies are colored in colder to warmer colors in the order shown in Table 1. The yellow dotted lines mark the location of the inlines shown in Figure 2 and 3. Compare Facies 5 and 8.

Figure 5. Volumetric display of the predicted facies from the encoder-decoder model. The facies volume is visually cropped for display purpose. An inline and a crossline of seismic amplitude co-rendered with predicted facies are also displayed to show a broader distribution of the facies. Target facies are colored in colder to warmer colors in the order shown in Table 1.


Araya-Polo, M., T. Dahlke, C. Frogner, C. Zhang, T. Poggio, and D. Hohl, 2017, Automated fault detection without seismic processing: The Leading Edge, 36, 208–214.

Araya-Polo, M., J. Jennings, A. Adler, and T. Dahlke, 2018, Deep-learning tomography: The Leading Edge, 37, 58–66.

Badrinarayanan, V., A. Kendall, and R. Cipolla, 2017, SegNet: A deep convolutional encoder-decoder architecture for image segmentation: IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481–2495.

Chen, L. C., G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, 2018, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs: IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834–848.

Chen, L. C., Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, 2018, Encoder-decoder with atrous separable convolution for semantic image segmentation: arXiv preprint, arXiv:1802.02611v2.

Fehler, M., and K. Larner, 2008, SEG advanced modeling (SEAM): Phase I first year update: The Leading Edge, 27, 1006–1007.

Huang, L., X. Dong, and T. E. Clee, 2017, A scalable deep learning platform for identifying geologic features from seismic attributes: The Leading Edge, 36, 249–256.

LeCun, Y., Y. Bengio, and G. Hinton, 2015, Deep learning: Nature, 521, 436–444.

Lewis, W., and D. Vigh, 2017, Deep learning prior models from seismic images for full-waveform inversion: 87th Annual International Meeting, SEG, Expanded Abstracts, 1512–1517.

Long, J., E. Shelhamer, and T. Darrell, 2015, Fully convolutional networks for semantic segmentation: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440.

Serfaty, Y., L. Itan, D. Chase, and Z. Koren, 2017, Wavefield separation via principle component analysis and deep learning in the local angle domain: 87th Annual International Meeting, SEG, Expanded Abstracts, 991–995.

Waldeland, A. U., and A. H. S. S. Solberg, 2017, Salt classification using deep learning: 79th Annual International Conference and Exhibition, EAGE, Extended Abstracts, Tu-B4-12.

Geobody Interpretation Through Multi-Attribute Surveys, Natural Clusters and Machine Learning

By Thomas A. Smith 
June 2017

Geobody interpretation through multi-attribute surveys, natural clusters and machine learning


Multi-attribute seismic samples (even as entire attribute surveys), Principal Component Analysis (PCA), attribute selection lists, and natural clusters in attribute space are candidate inputs to machine learning engines that can operate on these data to train neural network topologies and generate autopicked geobodies. This paper sets out a unified mathematical framework for the process from seismic samples to geobodies.  SOM is discussed in the context of inversion as a dimensionality-reducing classifier to deliver a winning neuron set.  PCA is a means to more clearly illuminate features of a particular class of geologic geobodies.  These principles are demonstrated with geobody autopicking below conventional thin bed resolution on a standard wedge model.


Seismic attributes are now an integral component of nearly every 3D seismic interpretation.  Early development in seismic attributes is traced to Taner and Sheriff (1977).  Attributes have a variety of purposes for both general exploration and reservoir characterization, as laid out clearly by Chopra and Marfurt (2007).  Taner (2003) summarizes attribute mathematics with a discussion of usage.

Self-Organizing Maps (SOM) are a type of unsupervised neural networks that self-train in the sense that they obtain information directly from the data.  The SOM neural network is completely self-taught, which is in contrast to the perceptron and its various cousins undergo supervised training.  The winning neuron set that results from training then classifies the training samples to test itself by finding the nearest neuron to each training sample (winning neuron).  In addition, other data may be classified as well.  First discovered by Kohonen (1984), then advanced and expanded by its success in a number of areas (Kohonen, 2001; Laaksonen, 2011), SOM has become a part of several established neural network textbooks, namely Haykin (2009) and Dutta, Hart and Stork (2001).  Although the style of SOM discussed here has been used commercially for several years, only recently have results on conventional DHI plays been published (Roden, Smith and Sacrey, 2015).

Three Spaces

The concept of framing seismic attributes as multi-attribute seismic samples for SOM training and classification was presented by Taner, Treitel, and Smith (2009) in an SEG Workshop.  In that presentation, survey data and their computed attributes reside in survey space.  The neural network resides in neuron topology space.  These two meet in attribute space where neurons hunt for natural clusters and learn their characteristics.

Results were shown for 3D surveys over the venerable Stratton Field and a Gulf of Mexico salt dome.  The Stratton Field SOM results clearly demonstrated that there are continuous geobody events in the weak reflectivity zone between C38 and F11 events, some of which are well below seismic tuning thickness, that could be tied to conventional reflections and which correlated with wireline logs at the wells.  Studies of SOM machine learning of seismic models were presented by Smith and Taner (2010).  They showed how winning neurons distribute themselves in attribute space in proportion to the density of multi-attribute samples.  Finally, interpretation of SOM salt dome results found a low probability zone where multi-attribute samples of poor fit correlated with an apparent salt seal and DHI down-dip conformance (Smith and Treitel, 2010).

Survey Space to Attribute Space:

Ordinary seismic samples of amplitude traces in a 3D survey may be described as an ordered  set .  A multi-attribute survey is a “Super 3D Survey” constructed by combining a number of attribute surveys with the amplitude survey.  This adds another dimension to the set and another subscript, so the new set of samples including the additional attributes is .  These data may be thought of as separate surveys or equivalently separate samples within one survey.  Within a single survey, each sample is a multi-attribute vector.  This reduces the subscript by one count so the set of multi-attribute vectors  .

Next, a two-way mapping function may be defined that references the location of any sample in the 3D survey by single and triplet indices  Now the three survey coordinates may be gathered into a single index so the multi-attribute vector samples are also an unordered set in attribute space  The index map is a way to find a sample a sample in attribute space from survey space and vice versa.

Multi-attribute sample and set in attribute space: 

A multi-attribute seismic sample is a column vector in an ordered set of three subscripts c,d,e representing sample index, trace index, and line index. Survey bins refer to indices d and e.  These samples may also be organized into an unordered set with subscript i.  They are members of an -dimensional real space.  The attribute data are normalized so in fact multi-attribute samples reside in scaled attribute space.

Natural clusters in attribute space: 

Just as there are reflecting horizons in survey space, there must be clusters of coherent energy in attribute space.  Random samples, which carry no information, are uniformly distributed in attribute space just as in survey space.  The set  of natural clusters in attribute space is unordered and contains m  members.  Here, the brackets [1, M]  indicate an index range.  The natural clusters may reside anywhere in attribute space, but attribute space is filled with multi-attribute samples, only some of which are meaningful natural clusters.  Natural clusters may be big or small, tightly packed or diffuse.  The rest of the samples are scattered throughout F-space.  Natural clusters are discovered in attribute space with learning machines imbued with simple training rules and aided by properties of their neural networks.

A single natural cluster: 

A natural cluster may have elements in it.  Every natural cluster is expected to have a different number of multi-attribute samples associated with it.  Each element is taken from the pool of the set of all multi-attribute samples   Every natural cluster may have a different number of multi-attribute samples associated with it so for any natural cluster,  then N(m).  Every natural cluster has its own unique properties described by the subset of samples  that are associated with it.  Some sample subsets associated with a winning neuron are small (“not so popular”) and some subsets are large (“very popular”).  The distribution of Euclidean distances may be tight (“packed”) or loose (“diffuse”).

Geobody sample and geobody set in survey space: 

For this presentation, a geobody G_b is defined as a contiguous region in survey space composed of elements which are identified by members g.  The members of a geobody are an ordered set  which registers with those coordinates of members of the multi-attribute seismic survey .

A geobody member is just an identification number (id), an integer .  Although the 3D seismic survey is a fully populated “brick” with members ,  the geobody members  register at certain contiguous locations, but not all of them.  The geobody  is an amorphous, but contiguous, “blob” within the “brick” of the 3D survey.  The coordinates of the geobody blob in the earth are  where  By this, all the multi-attribute samples in the geobody may be found, given the id and three survey coordinates of a seed point.

A single geobody in survey space

Each geobody  is a set of  N geobody  members with the same id.  That is, there are N members in , so N(b).  The geobody members for this geobody are taken from the pool of all geobody samples, the set  Some geobodies are small and others large.  Some are tabular, some lenticular, some channels, faults, columns, etc.  So how are geobodies and natural clusters related?

A geobody is not a natural cluster

This expression is short but sweet.  It says a lot.  On the left is the set of all B geobodies.  On the right is the set of M natural clusters.  The expression says that these two sets aren’t the same.  On the left, the geobody members are id numbers  These are in survey space.  On the right, the natural clusters  These are in attribute space.  What this means is that geobodies are not directly revealed by natural clusters.  So, what is missing?

Interpretation is conducted in survey space.  Machine learning is conducted in attribute space.  Someone has to pick the list of attributes.  The attributes must be tailored to the geological question at hand.  And a good geological question is always the best starting point for any interpretation.

A natural cluster is an imaged geobody

Here, a natural cluster C_m is defined as an unorganized set of two kinds of objects: a function I of a set of geobodies G_i and random noise N.  The number of geobodies is I and unspecified.  The function  is an illumination function which places the geobodies in  The illumination function is defined by the choice of attributes.  This is the attribute selection list.  The number of geobodies in a natural cluster C_m is zero or more, 0<i<I.  The geobodies are distributed throughout the 3D survey.

The natural cluster concentrates geobodies of similar illumination properties.  If there are no geobodies or there is no illumination with a particular attribute selection list,  , so the set is only noise.  The attribute selection list is a critically import part of multi-attribute seismic interpretation.  The wrong attribute list may not illuminate any geobodies at all.

Geobody inversion from a math perspective

Multi-attribute seismic interpretation proceeds from the preceding equation in three parts.  First, as part of an inversion process, a natural cluster   is statistically estimated by a machine learning classifier such as SOM  with a neural network topology.  See Chopra, Castagna and Potniaguie (2006) for a contrasting inversion methodology.  Secondly, SOM employs a simple training rule that a neuron nearest a selected training sample is declared the winner and the winning neuron advances toward the sample a small amount.  Neurons are trained by attraction to samples.  One complete pass through the training samples is called an epoch.  Other machine learning algorithm have other training rules to adapt to data.  Finally, SOM has a dimensionality reducing feature because information contained in natural clusters is transferred (imperfectly) to the winning neuron set in the finalized neural network topology through cooperative learning.  Neurons in winning neuron neighborhood topology move along with the winning neuron in attribute space.  SOM training is also dynamic in that the size of the neighborhood decreases with each training time step so that eventually the neighborhood shrinks so that all subsequent training steps are competitive.

Because  is a statistical estimate, let it be called the statistical estimate of the “signal” part of .  The true geobody is independent of an illumination function.  The dimensionality reduction   associated with multi-attribute interpretation has a purpose of geobody recognition through identification, dimensionality reduction and classification.  In fact, in the chain of steps there is a mapping and un-mapping process with no guarantee that the geobody will be recovered: 

However, the image function   may be inappropriate to illuminate the geobody in F-space because of a poor choice of attributes.  So at best, the geobodies is illuminated by an imperfect set of attributes and detected by a classifier that is primitive.  The results often must be combined, edited and packaged into useful, interpreted geobody units, ready to be incorporated into an evolving geomodel on which the interpretation will rest.

Attribute Space Illumination

One fundamental aspect of machine learning is dimensionality reduction from attribute space because its dimensions are usually beyond our grasp.  The approach taken here is from the perspective of manifolds which are defined as spaces with the property of “mapability” where Euclidean coordinates may be safely employed within any local neighborhood (Haykin, 2009, p.437-442).

The manifold assumption is important because SOM learning is routinely conducted on multi-attribute samples in attribute space using Euclidean distances to move neurons during training.  One of the first concerns of dimensionality reduction is the potential to lose details in natural clusters.  In practice, it has been found that halving the original amplitude sample interval is advantageous, but further downsampling has not proven to be beneficial.  Infilling a natural cluster allows neurons during competitive training to adapt to subtle details that might be missed in the original data.

Curse of Dimensionality

The Curse of Dimensionality (Haykin, 2009) is, in fact, many curses.  One problem is that uniformly sampled space increases dramatically with increasing dimensionality.  This has implications when gathering training samples for a neural network.  For example, cutting a unit length bar (1-D) with a sample interval of .01 results in 100 samples.  Dividing a unit length hypercube in 10-D with a similar sample interval results in 1020 samples (1010 x 102).  If the nature of attribute space requires uniform sampling across a broad numerical range, then a large number of attributes may not be practical.  However, uniform sampling is not an issue here because the objective is to locate and detail features of natural clusters.

Also, not all attributes are important.  In the hunt for natural clusters, PCA (Haykin, 2009) is often a valuable tool to assess the relative merits of each attribute in a SOM attribute selection list.  Depending on geologic objectives, several dominant attributes may be picked from the first, second or even third principal eigenvectors or may pick all attributes from one principle eigenvector.

Geobody inversion from an interpretation perspective

Multi-attribute seismic interpretation is finding geobodies in survey space aided by machine learning tools that hunt for natural clusters in attribute space.  The interpreter’s critical role in this process is the following:

  • Choose questions that carry exploration toward meaningful conclusions.
  • Be creative with seismic attributes so as to effectively address illumination of geologic geobodies.
  • Pick attribute selection lists with the assistance of PCA.
  • Review the results of machine learning which may identify interesting geobodies  in natural clusters autopicked by SOM.
  • Look through the noise to edit and build geobodies  with a workbench of visualization displays and a variety of statistical decision-making tools.
  • Construct geomodels by combining autopicked geobodies which in turn allow predictions on where to make better drilling decisions.

The Geomodel

After classification, picking geobodies from their winning neurons starts by filling an empty geomodel .  Natural clusters are consolidators of geobodies with common properties in attribute space so M < B.  In fact, it is often found that M << B .  That is, geobodies “stack” in attribute space.  Seismic data is noisy.  Natural clusters are consequentially statistical.  Not every sample g classified by a winning neuron is important although SOM classifies every sample. Samples that are a poor fit are probably noise.  Construction of a sensible geomodel depends on answering well thought out geological questions and phrased by selection of appropriate attribute selection lists.

Working below classic seismic tuning thickness

Classical seismic tuning thickness is λ/4.  Combining vertical incidence layer thickness  with  λ=V/f leads to a critical layer thickness  Resolution below classical seismic tuning thickness has been demonstrated with multi-attribute seismic samples and a machine learning classifier operating on those samples in scaled attribute space (Roden, et. al., 2015). High-quality natural clusters in attribute space imply tight, dense balls (low entropy, high density).  SOM training and classification of a classical wedge model at three noise levels is shown in Figures 1 and 2 which show tracking well below tuning thickness.

Seismic Processing: Processing the survey at a fine sample interval is preferred over resampling the final survey to a fine sample interval. Highest S/N ratio is always preferred. Preprocessing: Fine sample interval of base survey is preferred to raising the density of natural clusters and then computing attributes, but do not compute attributes and then resample because some attributes are not continuous functions. Derive all attributes from a single base survey in order to avoid misties. Attribute Selection List: Prefer attributes that address the specific properties of an intended geologic geobody. Working below tuning, prefer instantaneous attributes over attributes requiring spatial sampling.  Thin bed results on 3D surveys in the Eagle Ford Shale Facies of South Texas and in the Alibel horizon of the Middle Frio Onshore Texas and Group corroborated with extensive well control to verify consistent results for more accurate mapping of facies below tuning without usual traditional frequency assumptions (Roden, Smith, Santogrossi and Sacrey, personal communication, 2017).


There is a firm mathematical basis for a unified treatment of multi-attribute seismic samples, natural clusters, geobodies and machine learning classifiers such as SOM.  Interpretation of multi-attribute seismic data is showing great promise, having demonstrated resolution well below conventional seismic thin bed resolution due to high-quality natural clusters in attribute space which have been detected by a robust classifier such as SOM.


I am thankful to have worked with two great geoscientists, Tury Taner and Sven Treitel during the genesis of these ideas.  I am also grateful to work with an inspired and inspiring team of coworkers who are equally committed to excellence.  In particular, Rocky Roden and Deborah Sacrey are longstanding associates with a shared curiosity to understand things and colleagues of a hunter’s spirit.

Figure 1: Wedge models for three noise levels trained and classified by SOM with attribute list of amplitude and Hilbert transform (not shown) on 8 x 8 hexagonal neuron topology. Upper displays are amplitude. Middle displays are SOM classifications with a smooth color map. Lower displays are SOM classifications with a random color map. The rightmost vertical column is an enlargement of wedge model tips at highest noise level.  Multi-attribute classification samples are clearly tracking well below tuning thickness which is left of the center in the right column displays.

Figure 2: Attribute space for three wedge models with horizontal axis of amplitude and vertical axis of Hilbert transform. Upper displays are multi-attribute samples before SOM training and lower displays after training and samples classified by winning neurons in lower left with smooth color map.  Upper right is an enlargement of tip of third noise level wedge model from Figure 1 where below-tuning bed thickness is right of the thick vertical black line.


Chopra, S. J. Castagna and O. Potniaguine, 2006, Thin-bed reflectivity inversion, Extended abstracts, SEG Annual Meeting, New Orleans.

Chopra, S. and K.J. Marfurt, 2007, Seismic attributes for prospect identification and reservoir characterization, Geophysical Developments No. 11, SEG.

Dutta, R.O., P.E. Hart and D.G. Stork, 2001, Pattern Classification, 2nd ed.: Wiley.

Haykin, S., 2009, Neural networks and learning machines, 3rd ed.: Pearson.

Kohonen, T., 1984, Self-organization and associative memory, pp 125-245. Springer-Verlag. Berlin.

Kohonen, T., 2001, Self-organizing maps: Third extended addition, Springer, Series in Information Services.

Laaksonen, J. and T. Honkela, 2011, Advances in self-organizing maps, 8th International Workshop, WSOM 2011 Espoo, Finland, Springer.

Ma, Y. and Y. Fu, 2012, Manifold Learning Theory and Applications, CRC Press, Boca Raton.

Roden, R., T. Smith and D. Sacrey, 2015, Geologic pattern recognition from seismic attributes, principal component analysis and self-organizing maps, Interpretation, SEG, November 2015, SAE59-83.

Smith, T., and M.T. Taner, 2010, Natural clusters in multi-attribute seismics found with self-organizing maps: Source and signal  processing section paper 5: Presented at Robinson-Treitel Spring Symposium by GSH/SEG, Extended Abstracts.

Smith, T. and S. Treitel, 2010, Self-organizing artificial neural nets for automatic anomaly identification, Expanded abstracts, SEG Annual Convention, Denver.

Taner, M.T., 2003, Attributes revisited,, accessed 22 March 2017.

Taner, M.T., and R.E. Sheriff, 1977, Application of amplitude, frequency, and other attributes, to stratigraphic and hydrocarbon  determination, in C.E. Payton, ed., Applications to hydrocarbon exploration: AAPG Memoir 26, 301–327.

Taner, M.T., S. Treitel, and T. Smith, 2009, Self-organizing maps of multi-attribute 3D seismic reflection surveys, Presented at the 79th International SEG Convention, SEG 2009 Workshop on “What’s New in Seismic Interpretation,” Paper no. 6.

ChingWen Chen, seismic interpreter THOMAS A. SMITH is president and chief executive officer of Geophysical Insights, which he founded in 2008 to develop machine learning processes for multiattribute seismic analysis. Smith founded Seismic Micro-Technology in 1984, focused on personal computer-based seismic interpretation. He began his career in 1971 as a processing geophysicist at Chevron Geophysical. Smith is a recipient of the Society of Exploration Geophysicists’ Enterprise Award, Iowa State University’s Distinguished Alumni Award and the University of Houston’s Distinguished Alumni Award for Natural Sciences and Mathematics. He holds a B.S. and an M.S. in geology from Iowa State, and a Ph.D. in geophysics from the University of Houston.