Machine Learning Essentials for Seismic Interpretation: an e-Course by Dr. Tom Smith

Machine Learning Essentials for Seismic Interpretation: an e-Course by Dr. Tom Smith

Machine learning is foundational to the digital transformation of the oil & gas industry and will have a dramatic impact on the exploration and production of hydrocarbons.  Dr. Tom Smith, the founder and CEO of Geophysical Insights, conducts a comprehensive survey of machine learning technology and its applications in this 24-part series.  The course will benefit geoscientists, engineers, and data analysts at all experience levels, from data analysts who want to better understand applications of machine learning to geoscience, to senior geophysicists with deep experience in the field.

Aspects of supervised learning, unsupervised learning, classification and reclassification are introduced to illustrate how they work on seismic data.  Machine learning is presented, not as an end-all-be-all, but as a new set of tools which enables interpretation on seismic data on a new, higher level that of abstraction  that promises to reduce risks and identify features that which might otherwise be missed.

The following major topics are covered:

  • Operation  – supervised and unsupervised learning; buzzwords; examples
  • Foundation  – seismic processing for ML; attribute selection list objectives; principal component analysis
  • Practice  – geobodies; below-tuning; fluid contacts; making predictions
  • Prediction – the best well; the best seismic processing; over-fitting; cross-validation; who makes the best predictions?

This course can be taken for certification, or for informational purposes only (without certification). 

Enroll today for this valuable e-course from Geophysical Insights!

The Holy Grail of Machine Learning in Seismic Interpretation

The Holy Grail of Machine Learning in Seismic Interpretation

A few years ago, we had geophysics and geology – two distinct that were well defined. Then, geoscience came along, and it was an amalgam of geology and geophysics.  Many people started calling themselves geoscientists as opposed “geologist” or “geophysicist”. But the changes weren’t quite finished. Along came a qualifying adjective, and that has to do with unconventional resource development or unconventional exploration. We understand how to do exploration, but unconventional has to do with understanding shale and finding sweet spots, but it is a type of exploration.  By joining unconventional and resource development, we broaden what we do as professionals.  However, the mindset of unconventional geophysics is really closer to mining geophysics than it is conventional exploration.

A few years ago, we had geophysics and geology – two distinct that were well defined. Then, geoscience came along, and it was an amalgam of geology and geophysics.  Many people started calling themselves geoscientists as opposed “geologist” or “geophysicist”. But the changes weren’t quite finished. Along came a qualifying adjective, and that has to do with unconventional resource development or unconventional exploration. We understand how to do exploration, but unconventional has to do with understanding shale and finding sweet spots, but it is a type of exploration.  By joining unconventional and resource development, we broaden what we do as professionals.  However, the mindset of unconventional geophysics is really closer to mining geophysics than it is conventional exploration.

So, today’s topic has to do with the “holy grail” of machine learning in seismic interpretation.  We’re trying to tie this to seismic interpretation only.  Even if that’s a pretty big topic, we’re going to focus on a few highlights.  I can’t even summarize machine learning for seismic interpretation.  It’s already too big!  Nearly every company is investigating or applying machine learning these days.  So, for this talk I’m just going to have to focus on this narrow topic of machine learning in seismic interpretation and hit a few highlights.

Let’s start at 50,000 feet – way up at the top.  If you’ve been intimidated by this machine learning stuff, let’s define terms.  Machine learning is an engine.  It’s an algorithm that learns without explicit programming. That’s really fundamental. What does that mean? That means an algorithm that’s going to learn from the data. So, that means given one set of data, it’s going to come up with an answer, but with a different set of data, it will come up with a different answer.  The whole field of artificial intelligence is broken up into strong AI and Narrow AI.  Strong AI is coming up with a robot that looks and behaves like a person. Narrow AI attempts to duplicate the brain’s neurological processes that have been perfected over millions of years of biological development. A Self-organizing map, or SOM, is a type of neural network that adjusts to training data.  However, it makes no assumptions about the characteristics of the data.  So, if you look at the whole field of artificial intelligence, and then we look at machine learning as a subset of that, there are two parts: unsupervised neural networks and supervised neural networks.  Unsupervised is where you feed it the data and say “you go figure it out.”  In supervised neural networks, you give it both the data and the right answer. Some examples of supervised neural networks would be convolutional neural networks and deep learning algorithms.  Convolutional is a more classical type of a supervised neural network, where for every data sample, we know the answer.  So, a data sample might be ‘we have x, y, and z properties, and by the way, we know what the classification is a pri·o·ri. A classical example of a supervised neural network would be this: Your uncle just passed away and gave you the canning operations in Cordova, Alaska.  You go to the plant to see what you’ve inherited. Let’s say you’ve got all these people standing at a beltline manually sorting fish, and they’ve got buckets eels, and buckets for flounder, etc. Being a great geoscientist, you recognize this as an opportunity to apply machine learning to possibly re-assign those people to more productive tasks. As the fish come along, you weight them, you take a picture of them, you see what the scales are, general texture, you get some idea about the general shape of them.  You see what I’ve described are three properties, or attributes. Perhaps you add more attributes and are up to four or five. Now, we have 5 attributes that define each type of fish, so in mathematical terms, we’re now dealing with a five dimensional problem. We call this ‘Attribute Space’. Pretty soon, you run through all the eels and you get measurements for each eel.  So, you get the neural network trained on eels. And then you run through all the flounder. And guess what – there’s going to be variations, of course, but in attribute space, of those four or five measurements that we made for each one of type of fish are going to wind up in a different cluster in Attribute Space. And that’s how we tell the difference between eels and flounder. Or whatever else you got.  And everything else that you can’t classify very well, goes into a bucket that is labeled ‘unclassified’. (More on this later in the presentation.) And, you put that into your algorithm.  So that’s basically the difference between supervised neural networks and unsupervised neural networks. Deep learning is a category of neural networks that can operate in both supervised and unsupervised discovery.

Now, before we get deeper into our subject today, I’d like to draw your attention to some of the terms: the concept of Big Data.  If you remember a few years ago, if you wanted to survive in the oil and gas business, finding large fields was the objective. Well, we have another big thing today – Big Data. Our industry is looking at ways to apply the concepts of Big Data analytics. We hear senior management of E&P companies talking about Big Data and launching Data Analytics teams. So, what is Big Data or Data Analytics? It’s access to large volumes of disparate kinds of oil and gas data that is analyzed by machine learning algorithms to discover unknown relationships, those that were not identified previously. The other key point about Big Data is that it is disparate kinds. So the fact is you say “I’m doing Big Data analytics with my seismic data” – that’s not really an appropriate choice of terms. If you say “I’m going to throw in all my seismic data, along with associated wells, and my production data” – now you’re starting to talk about real Big Data operations.  And, the opportunities are huge. Finally, there’s IoT – Internet of Things – which you’ve probably heard or read.  I predict that IoT will have a larger impact on our industry than machine learning, however, the two are related.  And why is that?  Almost EVERYTHING we use can be wired to the internet. In seismic acquisition, for instance, we’re looking at smart geophones being hooked up that sense the direction of the boat and can send and receive data. In fact, when the geophones get planted, they have a GPS in each one of those things so that when it’s pulled up and thrown in the back of a pickup truck, the geophones can report their location in real-time.  There are countless other examples of how IoT will change our industry.

Let’s consider wirelines as a starting point of interpretation and figuring out the deposition of the environment using wireline classifications. If we pick a horizon, and based on that auto-picked horizon, we have a wavelet at every bin. We pull that wavelet out. In this auto-picked horizon, we may have a million samples and we have a million wavelets because we have a wavelet for each sample. (Some early neural learning tools were based on this concept of classifying wavelets.) Using these different classes, machine learning analyzes and trains on those million wavelets, finding say seven most significantly different. And then we go back and classify all of them. And so we have this cut shown here, across the channel and the wavelet, closest to the center, discovered to be tied to that channel. So there’s the channel wavelet, and now we have overbank wavelets, some splay wavelets – several different wavelets. And from this, a nice colormap can be produced indicating the type of wavelet.

Horizon attributes look at the properties of the wavelet along the vicinity of the horizon, at say frequency of 25 to 80 hertz with attributes like instantaneous phase. So we now have a collection of information about that pic using horizon attributes. Using volume attributes, we’ll look at a pair of horizons and integrate the seismic attributes between the horizons. This will result in a number, such as the average amplitude or average envelope value, that represents a sum of seismic samples in a time or depth interval. However, when considering machine learning, the method of analysis is fundamentally different. We have one seismic sample and associated with that sample we have multiple seismic attributes associated with that sample. This produces a multi-attribute sample vector that is the subject of the machine learning process.

Ok, so let’s take a look at some of the results: This is a self-organizing map, analysis of a wedge using only 2 attributes. We’ve got three cases – low, medium, and high levels of noise, and in the box over here you can see tuning thickness is right here, and everything to the right of that arrow is below tuning. Now, the SOM is multi-attribute samples. And in this case, we are keeping things very simple since we only have two attributes. If you have only two attributes, you can plot them on a piece of paper – x axis, y axis. However, the classification process works just fine for two dimensions or twenty dimensions.  It’s a machine learning algorithm. In two dimensions, we can look at it and decide “did it do a good job or did it not?” For this example, we’ve used the amplitude and the Hilbert Transform because we know they’re orthogonal to each other. We can plot those as individual points on paper. Every sample is a point on that scatter plot. However, if we put it through a SOM analysis, the first stage is SOM training, which is trying to locate natural clusters in attribute space, and then the second phase is once those neurons have gone through the training process, we then take the results out and classify ALL the samples. So, we have here the results – every single sample is classified. Low noise, medium noise, high noise, and here are the results here.  If you go to tuning thickness, we are tracking with SOM analysis events way below tuning thickness.  And the fact that there’s the top of the wedge or … this one right here is where things get below tuning thickness. Eventually tip the corresponding trace right over there.  Now, there’s a certain bias.  We are using here for this analysis a two-dimensional topology – it’s two dimensions, but also the connectivity is hexagonal connectivity between these neurons, which is made use of during the training process.  And there’s a certain bias here because this is a smooth colormap.  By the way, these are colormaps as opposed to colorbars.  Right? Color maps, not colorbars. In terms of color MAPS, you can have four points of connectivity, and then it’s just like a grid.  Or 6 points of connectivity, and then it’s hexagonal.  That helps us understand the training that was used. Well, there’s a certain bias about having smooth colors and we have attempted in this process here – there’s 8 rows and 8 columns – every single one of those has gone looking for a natural cluster in attribute space.  Although it’s only two dimensions, they are still is a hunting process. Each of these 64 neurons, after the training process, are trying to zero in on a natural cluster. And there’s a certain bias here in using smooth colors because that happens like yellow and greens and here’s blues and reds. Here’s a random color – and you can see the results.  But even if we use random colors, we are still tracking events way below tuning thickness using the SOM classification.

We are demonstrating the resolution well below tuning.  There’s no magic.  We use only two attributes – the real part and the imaginary part, which is the Hilbert Transform, and we are demonstrating the SOM characteristics of training using only two attributes.

The self-organizing map, SOM, training algorithm is modeled on discovering of natural clusters in attribute space, using training rules based upon the human visual cortex.  Conceptually, this is a simple but powerful idea. We can see examples in nature of simple rules that lead to profound results.

So, the whole idea behind self-organizing assemblages is the following:  Snow geese and fish are both examples of self-organizing assemblages. Individuals follow a simple rule.  The individual goose is just basically following a very simple rule: Follow the goose in front of me, just a few feet behind and either left or right. It’s a simple as that.  That’s an example of self-organizing assemblage, but yet some of the properties of that are pretty profound, because once they get up to altitude, they can go for a long time and long distances using the slipstream properties of that “v” formation.  The basic rule for a schooling fish is ‘swim close to your buddies.  Not so close that you’ll bump into them, and not so far away that it doesn’t get represented as a school of fish.’ When the shark swims by, the school needs to look like one big fish. If those individual fish were too far apart, the shark would see the smaller isolated fish as easy prey. So, there’s even a simple rule here of a optimum distance one to the other. These are just two examples of where simple rules produce complex results when applied at scale.

Unsupervised neural networks work, which classify the data, also work on simple rules but operating on large volumes of seismic samples in attribute space.

The first example is the Eagle Ford case study. Patricia Santagrossi published these results last year.  This is a 3D survey of a little over 200 square miles. The SOM analysis was run between the Buda and the Austin Chalk and the Eagle Ford is right above the Buda in this little region right there.  The Eagle Ford shale layer was 108′ thick, which is only 14 ms.  Now both the Buda and Austin Chalk are known , strong peak events. So, if you count how many cycles we go through here, peak trough, kind of a doublet, trough, peak. The good stuff here is basically all the bed from one peak to one trough. Conventional seismic data. Here’s the Eagle Ford shale as measured right at the Buda break well there.  We have both a horizontal and a vertical well right here. And that trough is associated with the Eagle Ford Shale.  That trough and that peak. So, this is the SOM result with an 8x8 set of neurons that are used for the training. Look at the visible amount of detail here. Not just between the Buda and the Austin Chalk, but actually you can see how things are changing, even along the formation here, within the run of the horizontal well. Because every change in color here corresponds to a change in neuron.

These results were computed by machine learning using seismic attributes alone. We did not tie the results to any of the wells. The SOM analysis was run on seismic samples with multiple attributes values. The key idea here is simultaneous multi-attribute analysis using machine learning. Now, let’s look further at this Eagle Ford case study.

These are results computed by machine learning using seismic attributes.  We did not skew the results and tie them to any of the wells.  They were not forced to fit the wells or anything else. The SOM analysis was run strictly on seismic data and the multi-attribute seismic samples.  Again, the right term is simultaneous multi-attribute analysis. Multi-attribute, meaning it’s a vector. In our analysis every single sample is being used simultaneously to classify the data – a solution.  So although this area is 200 square miles from an aerial view, between the Buda and the Austin Chalk, we’re looking at every single sample – not just wavelets. By simple inspection, we can see that the results corroborate the results of applying machine learning with the well logs, but there has been no force fitting of the data. These arrows are referring to the SOM winning neurons. If we look at detail, here is Well #8, a vertical well in the Eagle Ford shale. The high resistivity zone is right in here. That could be tied into the red stuff. So, here again we’re dealing with seismic data on a sample-by-sample basis.

The SOM winning neurons identified 24 geobodies, autopicked in 150 feet of vertical section at this well on #8 in the Eagle Ford borehole. Some of the geobodies – not all of them – some of them track the underwells and went over the entire 200 sq. mile 3D survey.

This is to zero in a little bit more.  So I can give you some association here. This is the high resistivity zone is correlating with winning neuron 54, 60, and 53 in this zone right in here. There’s the Eagle Ford Ash that is identified with neurons 63 and 64. And Patricia even found to tie in with this marker right here – this is neuron 55.

And this well, by the way, well #8, was 372 Mboe. SOM classification neurons are associated with specific wireline lithofacies units. That’s really hard to argue against.  We have evidence, in this case up here for example, of an unconformity where we lost a neuron right through here and then we picked it up again over there.  And, there is evidence in the Marl of slumping of some kind.  So, we’re starting to understand what’s happening geologically using machine learning. We’re seeing finer detail – more than we would have using conventional seismic data and single attributes.

Tricia found a generalized cross-section of Cretaceous in Texas, northwest / southeast towards the gulf. Eagle Ford shale fits in here below the Marl and there’s an unconformity between those two – she was able to see some evidence of that.

The well that we just looked at was well #8, and it ties in with the winning neuron.  Let’s take a look at another well, say for example, well #3, a vertical well with some x-ray diffraction associated with it. We can truly nail this stuff with the real lithology, so not only do we have a wireline result, but we also have X-ray diffraction results to corroborate the classification results.

So, of the 64 neurons, over 41,000 were classified as “the good stuff.” Not on a sample basis, so you can integrate that – you can tally all that stuff up and start to come up with estimates.

So, specific geobodies relate to winning neuron that we’re tracking – #12 – that’s the bottom line. And from that we were able to develop a whole Wheeler diagram for the Eagle Ford group for the survey.  And the good stuff are the winning neurons 58 and 57. They end up on the neuron topology map here, so those two were associated with the wireline lithofacies footstep – the high resistivity part of the Eagle Ford shale. But she was able to work out additional things, such as more clastics and carbonates in the west and clastics in the southeast. And, she was able to work out not only Debris Apron, but the ashy beds and how they tie in.  Altogether, these were the neurons associated with the Eagle Ford shale. These were the neurons – 1, 9, and 10, that’s the basal clay shale.  And the Marls were associated with these neurons.

So, the autopicked geobodies, across the survey on the basis we’re developing the depositional environment of the Eagle Ford that compare favorable with the well logs. Using seismic data alone, one of our associates received feedback to the effect that “seismic is only good in conventionals, just for the big structural picture.” Man, what a sad conclusion that is.  There’s a heck of a lot  more out of this high resistivity zone pay that was associated with two specific neurons, demonstrating that this machine learning technology is equally applicable to unconventionals.

The second case study here is the Gulf of Mexico, by my distinguished associate, Mr. Rocky Roden. This is not deepwater – only approximately 300 feet. Here’s a north fault amplitude buildup. Here, these are time contours and the amplitude conformance to structure is pretty good. In this crossline – 3183 – going from west to east is the distribution of the histogram of the values. You can see here in the dotted portion, this is just the amplitude display, and the box right here is a blowup of the edge right there of that reservoir. What you can see here is the SOM classification using colors.  Red is associated with the gas-over-oil contact and oil-over-water contact. A single sample.  So here we have the use of machine learning to help us find fluid contacts, which are very difficult to see.  This is all without having bandwidth, frequency range, point source, point receivers – it isn’t a case of everything dialed in just the right way. The rest of the story is just the use of machine learning. However, it’s machine learning on not just samples of single numbers, but each sample as a combination of attributes; as a vector. Using that choice of attributes, we’re able to identify fluid contacts. For easier viewing, we make all these others transparent and only show those that you can see visually here of what has been estimated using the classifier of the fluid contacts and also the hills.  In addition, look at the edges. The ability to define the edge of the reservoirs and come up with volumetrics, is pretty clear to be superior. Over here on the left, Rocky’s taken the “goodness of fit”, which is an estimate of the probability of how well each of these samples fits the winning neuron, and by lowering the probability limit, and saying “I just want to look at the anomalies”, that edge of the amplitude conformance of structure, I think is clearly better than what you would have using amplitude alone.

So, new machine learning technology stuff using simultaneous multi-attributes is resolving much finer reservoir detail than we’ve had in the past, and the geobodies that fit the reservoirs are revealed in the details, frankly, previously not available.

In general, this is what our “Earth to Earth” model looks like.  If we start here with the 3D survey, and then from the 3D survey, we decide on a set of attributes.  We take all our samples, which are vectors because of our choice of attributes, and then literally, plot them in attribute space. If you’ve 5 attributes, it’s 5-dimensional space.  If you have 8 attributes, it’s 8-dimensional space. And your choice of attributes is going to illuminate different properties of the reservoir. So, the choice of attributes that Rocky used helped to zero in on those fluid contacts, would not be the ones he would use to illuminate the volume properties or the absorption properties, for example.  Once the attribute volumes is in attribute space, we use a machine learning classifier to analyze and look for natural clusters of information in attribute space. Once those are classified in attribute space, the results then, are presented back in a virtual model, if you will, of the earth itself. So, our job here is our picking geobodies, some of which have geologic significance and some of which don’t.   The real power is in the natural clusters of information in attribute space.  If you have a channel and you’ve got the attributes selected to illuminate channel properties, then, every single point that is associated with the channel, no matter where it is, is going to concentrate in the same place in attribute space.  Natural clusters of information in attribute space are all stacking.  The neurons are hunting, looking for natural clusters, or higher density, in attribute space.  They do this using very simple rules.  The mathematics behind this process were published by us in the November 2015 edition of the Interpretation journal, so if you would like to dig into the details, I invite you to read that paper, which is available on our website.

Two keys are: 1. Attribute selection list. Think about your choice of attributes as an illumination function. What you are trying to do with your choice of attributes is an illumination function of the real geobodies in the earth and how they end up as natural clusters in attribute space. And that’s the key.  2. Neurons search for clusters of information in attribute space. Remember the movie, The Matrix? The humans had to be still and hide from the machines that went crazy and hunted them. That’s not too unlike what’s going on in attribute space. It’s like The Matrix because the data samples themselves don’t move. They’re just waiting there. It’s the neurons that are running around in attribute space, looking for clusters of information. The natural cluster is an image of one or more geobodies in the earth, but it’s been illuminated in attribute space, totally depending on the illumination list.  It stacks in common place in attributes – that’s the key.

Seismic stratigraphy is broken up into two levels here: first is seismic sequence analysis where you look at your seismic data and you organize it and break it up in to packets of concordant reflections. It’s pretty straightforward stuff – chaotic depositional patterns.  And then after you have developed a sequence analysis, you can categorize the different sequences. You have a facies analysis trying to infer the depositional setting. Is the sea level rising? Is it falling? Is it stationary? All this naturally falls in because the seismic reflections are revealing geology on a very broad basis.

Well, the attribute – it’s hunting geobodies as well. Multi-attribute geobodies are also components of seismic stratigraphy. We define it this way: a simple geobody has been auto-picked by machine learning in attribute space. That’s all it is – we’re defining a simple geobody. We all know how to run an auto-picker. In 15 minutes, you can be taught how to run an auto-picker in attribute space. Complex geobodies are interpreted by you and I. We look at the simple geobodies and we composite those just the way we saw in that wheeler diagram. We combine those to make complex geobodies.  We give it a name, some kind of texture, some kind of surface – all those things are interpreted geobodies and the construction of these complex geobodies can be sped up by some geologic rule-making.

Now the mathematical foundation we published in 2015 ties this altogether pretty nicely. You see, machine learning isn’t magic.  It depends on the noise level of the seismic data. Random noise broadens natural clusters in attribute space. What that means then, is that we’re attenuating noise so optimum acquisition and data processing, delivering natural clusters with the greatest separation. In other words, nice, tight clusters in attribute space will be much easier for the machine learning algorithm to identify when you have nice, clean identification and separation. So, acquisition and data processing matters.

However, this isn’t talking about coherent noise. Coherent noise is something else. Because with coherent noise, you may have an acquisition footprint, but that forms a cluster in attribute space and one of those neurons is going to go after that just as well because it’s an increase in information density in attribute space and voila – you have a handful of neurons that are associated with an acquisition footprint. Coherent noise can be deducted by the classification process where the processor has merged two surveys.

Second thing: Better wavelet processing leads to narrower, natural clusters, more compact natural clusters leads to better geobody resolution because geobodies are derived from natural clusters.

Last but not least, larger neural networks produce greater geobody details. You run a 6x6, an 8x8 and a 10x10 2D Colormaps, you eventually get to the point where you’re just swamped with details and you just can’t figure this thing out. We see that again and again.  So, it’s better to look at the situation from 40K feet, and then 20, and then 10. Usually, we just go ahead and run all three SOM runs all at once to get them all done and to examine them in increasing levels of detail.

I’d like to now switch gears on something entirely different.  Put the SOM box here aside for a minute, and let’s revisit the work Rocky Roden did in the Gulf of Mexico . Rocky came up with an important way of thinking about the application of this new tool.

In terms of using multi-attribute seismic interpretation – think of it as a process and what’s really important is starting with the geologic question of what you want to answer. For example: we’re trying to illuminate channels. Ok, so there are a certain set of attributes that would be good.  So, what we have then here is, ask the question first. Firmly have that in your mind for this multi-attribute seismic interpretation process.

There’s a certain set of attributes for the geologic question, and the terminology for that set is the “attribute selection list”. When you do an interpretation like this, you really need to be aware of the current attributes being used when looking at the data. Depending on the question, we then take the discipline and we say “well, if this is the question you’re asking”, this attribute selection list is appropriate. Remember, the attribute selection list is an illumination function.

Once you have the geologic process, the next step is the attribute selection list, and then classify simple geobodies, which is auto-picking your data in attribute space and looking at the results.

Now, this just doesn’t happen in back and it just doesn’t happen at once – it’s an iterative process. So, interpreting complex geobodies is basically more than one SOM run, and more than one geologic question. And interpreting these results at different levels – how many neurons, that sort of thing, this is a whole seismic interpretation process. Interpreting these complex geobodies is the next step.

We’re looking at results and constructing geologic models. Decide which is the final geologic model, and then our last step is making property predictions.

So, in the world of multiple geologic models, or multiple statistic models, it really doesn’t make any difference. We select the model, we test the model, we select a bunch of models, we test those models, and we choose one! Why? Because we want to make some predictions.  There’s got to be one final model that we decide on as professional that this is most reliable and we’re going to use it.  Whether it’s exploration, exploitation, or even appraisal, same methodology – it’s all the same for geologic models and statistical models.

The point here boils down to something pretty fundamental.  As exploration geophysicists, we’re in the business of prediction. That’s our business. The boss wants to know “where do you want to drill, and how deep? And what should we expect on the way down? Do we have any surprises here?” They want answers! And we’re in the business of prediction.

So how good you are as a geoscientist depends, fundamentally, on how good are your predictions of your final model? That’s what we do. Whether you want to think about it like that or not, that’s really the bottom line. 

So this is really about model building for multi-attribute interpretation – that’s the first step. Then we’re going to test the model and choose the model. Ok, so, should that model-building be shipped out as a data processing project? Or through our geo-processing people?  Or is that really something that should be part of interpretation? Do you really trust that the right models have been built from geoprocessing? Maybe. Maybe not.  If it takes 3 months, you sure hope you have the right model from a data processing company. And foolish, foolish, foolish if you think there’s only one run.  That’s really dangerous.  That’s a kiss and a prayer, and oh, after three months, this is what you’re going to build your model on. 

So, as an aside, if you decide that building models is a data processing job, where’s the spontaneity? And I ask you – where’s the sense of adventure? Where’s the sense of the hunt? That’s what interpretation is all about – the hunt. Do you trust that the right questions have been asked before the models are built?  And my final point here is that there are hundred’s of reasons just to follow procedure.  Stay on the path and follow procedure. Unfortunately, nobody wants to argue. The truth here is what we’re looking for. And truth, invariably – that path has twists and turns. That’s exploration. That’s what we’re doing here.  That’s fun stuff. That’s what keeps our juices going… about finding those little twists and turns and zeroing in on finding truth. 

Now model testing and final selection have begun when models are built and you decide which is the right one. For example, you generate 3 SOMs – an 8x8, 12x12, 4x4, and you look at results and the boss says “ok, you’ve been monkeying around long enough, what’s the answer? Give me the answer”… “Well…hmm…” you respond. “I like this one. I think 8x8 is the right one.”  Now, you could do that, but you might not want to admit it to the boss! One quantitative way of comparing models would be to look at your residual errors.  The only trouble with that is it’s not very robust. However, a quantitative assessment – comparing models – is a good way to go. 

So, there is a better methodology – better than just comparing residual errors – this is a whole field of cross-validation methodologies. Not going to go into that stuff right here, but some cross-validation tools: bootstrapping, bagging, and even Bayesian statistics are helpful tools in helping us prepare models and helping us figure out the model that is robust and in the face of new data is going to give us a good strong answer – NOT the answer that fits the data the best. 

Think about the old problem of fitting a least squares line through some data. You write your algorithm in python or whatever tool, and it kind of fits through the data, and the boss goes “I don’t know why you’re monkeying around with lines. I think this is an exponential curve because this is production data.” So, you make an exponential curve.  Now, this business of cross-validation, think about this: fitting a polynomial to the data: two terms, a line, three terms, a parabola, four terms … until n… we could make n equal 15 and by golly there’s no possibility of error – we crank that thing down. The trouble is, we have over-fit the data. It fits this data perfect, but some new data comes in and it’s a terrible model because the errors are going to be really high. It’s not robust. So, this whole comes up to cross validation methodology is really very important. The future here is, “who’s going to be making the prediction – you, or the machine?” I maintain to make good decisions, it’s going to be us! We’re the ones that will be making the right characteristics – because we’ll leverage machine learning.

Let’s take a look at Machine Learning. Our company vision is the following: 

“There’s no reason why we cannot expect to query our seismic data for information with learning machines, just as effortlessly and with as much reliability as we query the web for the nearest gas station.” 

Now this statement of where our company is going is not a statement of “get rid of the interpreters“. It’s a statement, in my way of thinking, and in all of us at our operations, it’s a statement of a way forward. Because truly, this use of machine learning is a whole new way of doing seismic interpretation. It’s using it as a tool – it’s not replacing anybody.  Deep learning, which is important for seismic evaluation, might be a holy grail, but its roots are in image processing, not in the physics of wave motion. Be very careful with that.

Image processing is very good at telling the difference between Glen and me from that have pictures of us. Or if you have kitties and you have little doggies, image processing can classify those, even right down to those that you’re not real certain whether it’s a dog or cat.  So, deep learning is focused on image processing and also on the subtle distinctions between what is the essence of a dog and what is the essence of a cat, irrespective of whether the cat is laying there or standing there or climbing up a tree.  That’s the real power of this sort of thing. 

Here’s a comparison of SOM and Deep Learning in terms of all of its properties, and there’s good and bad things about each one of these.  There’s no magic about any one of these. Not to say one’s better than the other.

I would like to point out that unsupervised machine learning trains by discovering natural clusters in attribute space. Once those natural clusters have been identified in attribute space, attribute space is carved up and say any samples to this region right in here in attribute space corresponds to this winning neuron and over here is that winning neuron.  Your data is auto-picked and put back in 3-dimensional space in a virtual 3D survey. That’s the essence of what’s available today.

Supervised machine learning trains on patters that it discovers on amplitude data alone. Now there are two deep learning algorithms that are popular today. One’s called Convolutional Neural Network, which learns by visual patterns, faces, sometimes called eigenfaces, uses PCA. And then there are fully convolutional networks, which are using sample size patches and full connections between the network layers. 

Here’s a little cartoon showing you this business about layers.  This is the picture and trying to identify the little features of this, you can’t say that this is a robot, as opposed to a cat or a dog, until it goes through this analysis. Using patching and features maps, using different features for different things, it goes from one patch to the next to the next, until finally – your outputs here -well, it must be robot, dog, or kitty. It’s a classifier using the properties it has discovered in a single image. The algorithm has discovered its own attributes. You might say “that’s pretty cool”. And indeed it is, but it’s only using the information seen in that picture. So, it’s association – it’s the texture features of that image. 

Here’s an example from one of our associates – Tao Zhao – he’s been working in the area of full convolutional networks. This example is where he’s done some training – training lines A – clinoforms here, chaotic deposition here, maybe some salt down there, and then some concordant reflections up top. Here’s an example of the results of the FCN. And then here is the classification of salt down here. So, the displays here are examples of full convolutional networks. 

One final point and then I’ll sit down: Data is more important than the algorithms. The training rules are very simple. Remember the snow geese? Remember the fish? If you were a fish or if you were a snow goose, the rules are pretty simple. There’s a fanny – I’m gonna be about 3 feet behind it, and I’m not gonna be right behind the snow goose ahead of me – I want to be either to the left or the right. Simple rule. You’re a fish, you want to have another fish around you of a certain distance. Simple rules. What’s important here is data is more important than the algorithms.

Here is an example taken from E&P Magazine this month (January). For several years this company called Solution Seekers has been training on production data using a variety of different data and looking for patterns to develop best practice drilling recommendations. Kind of a cool big-picture kind of a concept.

So machine learning training rules are simple  – the real value is the classification of results it’s the data the builds the complexity. My question to you is: Does this really address the right questions? If it does, extremely valuable stuff. If it misses the direction of where we’re going – the geologic question – it’s not that useful.

So machine learning training rules are simple  – the real value is the classification of results it’s the data the builds the complexity. My question to you is: Does this really address the right questions? If it does, it’s extremely valuable stuff. If it misses the direction of where we’re going – the geologic question – it’s not that useful.

Introduction to Self-Organizing Maps in Multi-Attribute Seismic Data

Introduction to Self-Organizing Maps in Multi-Attribute Seismic Data

By Tom Smith and Sven Treitel
Published with permission: Geophysical Society of Houston
January 2011

Unsupervised neural network searches multi-dimensional data for natural clusters. Neurons are attracted to areas of higher information density. The SOM analysis relates to subsurface geometry and rock properties while noting multi-attribute seismic properties at the wells, correlating to rock lithologies, with those away from the wells.

Computers that think like a human are well beyond our current capabilities but computers that learn are not. They are around us every day. Pocket cameras identify faces in a live digital image and automatically adjust the focus when the shutter is pressed. Post offices scan the mail and route the documents appropriately. Offices scan documents as bitmaps and convert them to text documents for editing. Web documents are indexed for content, while search engines deliver these documents through key word searches in unprecedented detail and with extraordinary speed.

We have seen a tremendous growth in the size of 3D survey seismic data volumes, and it is common today for both 2D and 3D seismic surveys to be integrated into the interpretation. Moreover, the primary survey of reflection amplitude is interpreted along with derived surveys of perhaps 5 to 25 attributes. The attributes of both 2D and 3D surveys represent multidimensional data. The problem is to keep all this data in one’s head while trying to find oil and gas. Much interpretation effort is devoted to building a geologic framework from the seismic data, identifying key reflecting intervals where oil and gas might be found and finding an interesting anomaly. At this point attributes are the framework in which we evaluate the anomaly. But this is the point where we can easily mislead ourselves. It is quite easy to build a plausible model for a prospect using only those attributes which fit our model and ignore the rest. This is bad enough, but there is even a greater crime. Lurking in the data may be combinations of attributes which are legitimate anomalies but which are never found at all.

Learning machines are artificial neural networks which can construct an experience data base from multidimensional data such as multi-attribute seismic surveys. There are two main classes of neural networks – supervised and unsupervised. With supervised neural networks, a network classifies data into groups sharing given characteristics that have already been classified by an expert. After careful processing, synthetic seismograms that are prepared at well sites serve as the expert’s data. Then the neural network is trained to classify these data at the wells. After training, the neural network literally roams the seismic data to classify areas which might be similar in some given sense to models developed at the well locations.

Alternatively, an unsupervised neural network searches multidimensional data for natural clusters. Neurons are attracted to areas of higher information density. The most popular unsupervised neural network, self-organizing maps (SOM), were introduced by Teuvo Kohonen in 1981 [1]. SOM was successfully applied to seismic facies analysis by Poupon, Azbel and Ingram in 1999 (Stratimagic) [2]. We preface recent efforts to bring SOM to bear on multiattribute seismic interpretation with a simple SOM example used by Kohonen to illustrate some of its basic features.

Quality of Life

An early problem considered by Kohonen and his research team was to identify natural clusters as they relate to quality of life factors based on World Bank data. A study that included 126 countries, considered a total of 39 measurements describing the level of poverty found in each country. While the data matrix was somewhat limited by incomplete reporting, the SOM results are still quite interesting. Shown in Figure 1 is the SOM which resulted from the learning process. Canada (CAN) and the United States of America (USA) clustered at the same neuron location shown at the 6th row of the 1st column in the figure. Ethiopia (ETH) is found on the right edge at column 13, row 5. Other country abbreviations and further details are in [3].



Figure 1: Self-organizing map (SOM) of World Bank quality of life data.

The reason that countries of similar quality of life cluster in similar neuron areas has to do with learning principles that are built into SOM. In this study, every country is a sample and that sample is a column vector of 39 elements. In other words, there are 39 attributes in this problem. Countries of similar characteristics (a natural cluster) plot in about the same place in attribute space. At the beginning of the learning process, neurons of 39 dimensions are assigned random numbers. During the learning process, the neurons move toward natural clusters. The data points never move. The mathematics of SOM learning define both competitive and cooperative learning. For a given data sample, the Euclidean distance is computed between the sample and each neuron. The neuron which is “nearest” to the data sample is declared the “winning” neuron and allowed to advance a fraction of the distance toward the data sample. The neuron movement is the essence of machine learning. Competitive learning is embodied in the strategy that the winning neuron moves toward the data sample.

This aspect of cooperative learning is related to the layout of the neural network. In SOM learning, the neural network is commonly a 2D hexagonal grid. This constitutes the neuron topology; the choice of a hexagonal grid rather than a rectangular grid will be apparent shortly. When a winning neuron has been found, cooperative learning takes place because the neurons in the vicinity of the winning neuron (the neighborhood) are also allowed to move toward the data sample, but by an amount less than the winning neuron. In fact, the further a neighborhood is away from the winning neuron, the less it is allowed to move. Hexagonal grids move more neurons than rectangular grids because they have 6 points of contact with their immediate neighbors instead of 4. Learning continues as winning and neighborhood neurons move toward each sample in turn until the entire set of samples has been processed. At this point, one epoch of learning has been completed. The event is marked as one time step in the learning process. For each subsequent epoch the distance a winning neuron may move toward a data sample is reduced slightly and the size of the neighborhood is also reduced. The learning process terminates when there is no further appreciable movement of the neurons. Often the number of such epochs can be in the hundreds or thousands.

As demonstrated in Figure 1, natural clustering of like-quality of life countries arises from both competitive and cooperative learning. But one may ask how is SOM learning unsupervised when the SOM map displays country labels? The answer is that in the steps just described, there is no need to order the sequence of samples in the SOM learning process. The Ethiopia sample may be processed between samples for Canada and USA with no effect on the outcome. The sample order of countries may be scrambled randomly.


Figure 2: Classification of quality of life data


Figure 3: Gulf of Mexico Salt Dome

In Kohonen’s analysis of the World Bank data, the names of countries are known, however. When the SOM learning process is completed, the neuron which is closest to each country sample is labeled by the country label as shown in Figure 1. The neuron colors are arbitrary. Figure 2 is a world map in which each country is colored with the color scale used in Figure 1. Countries with similar quality of life are therefore colored similarly. Several countries which did not contribute data for the report are colored gray (Russia, Iceland, Cuba and several others). Figure 2 illustrates how the results of neural network analysis are used to classify the data. We shall see in the next section how SOM analysis and classification is an important addition to seismic interpretation.

Gulf of Mexico Salt Dome Survey

A SOM analysis was conducted on a 3D survey in the Gulf of Mexico provided by FairfieldNodal. See [4] for a description of SOM theory and a discussion of the processing steps. In particular, the introduction of a so-called curvature measure and the harvesting process are particularly relevant. Figure 3 is a vertical amplitude section across the center of the salt. Figure 4 shows the SOM analysis of 13 attributes across the same location. The SOM map is a 2D colorbar based on an 8 x 8 hexagonal grid. There are 100 epochs in the present analysis. It is readily apparent that the SOM classification is tracking seismic reflections.

SOM Classification

Figure 4: SOM classification and map. Red horizontal line marks the time of Figure 5.

Time slice

Figure 5: Time slice. Red line marks the location of Figure 4.

Shown in Figure 4 are white portions in which data have been “declassified”, a concept which we now explain. After the SOM analysis is completed, every sample in the survey is associated with a winning neuron. This implies that every neuron is associated with a given set of samples.

For any particular neuron, some samples are nearby in attribute space and others are far away. This means that there is a statistical population of distances on which to declassify what we shall call “outliers”. When a neuron is near a data sample, the probability that the sample is correctly classified is high. If a neuron and sample coincide, the probability is 100%. In Figure 4, those samples for which the probability is less than 10% are not assigned any classification. We identify such outliers as SOM anomalies. SOM anomalies are scattered about the section, with several which are larger and more compact. The horizontal red line marks the time of the time slice shown in Figure 5.

The horizontal line in Figure 5 marks the location of the section in Figure 4. Notice the white area to the right of the salt dome crossed by the red line in Figure 4 is identified as the same white area right of the salt dome and crossed by the red line in Figure 5. We note that the SOM anomaly is a discrete geobody which appears to be related to the upturned beds flanking the salt. By geobody, we mean a contiguous region of samples in the survey which share some characteristic.

Arbitrary LineFigure 6: Arbitrary line through 3D survey passing through gas-show well (left) and producing gas (right)

SOM Classifcation of 3D Survey

Figure 7: SOM classification of 3D survey. Red horizontal line marks the time for the time slice of Figure 8

Wharton County Survey

A SOM analysis was also conducted on a 3D survey in Wharton County, Texas provided by Auburn Energy. Details of this study are found in [5]. An arbitrary line through the survey between two wells is shown in Figure 6.
The well at the left presented a gas show while the well at the right developed a single-well gas field. Note the association of gas with faults F1, F2 and F3.

Figure 7 shows a portion of the results of a SOM classification run designed in the same way as in the previous example, namely by use of the same 13 attributes, an 8 x 8 hexagonal topology of neurons and a probability cut-off of 10%. Notice that this selection of attributes did not delineate the faults very well, yet SOM anomalies are found near both wells. The time slice of Figure 8 confirms that the SOM anomaly to the left of the gas-show well (left) is a geobody. A smaller second SOM anomaly is shown right of the F2 fault. Figure 9 is a time slice through the lower SOM anomaly near the gas well (right) of Figure 7. Notice that it too is a geobody. Prior to the present SOM analysis, an earlier thorough interpretation had been conducted with all available geophysical and geological data. A large set of attributes was used, including AVO gathers, offset stacks, advanced processing as well as some proprietary attributes. As a result, four wells were drilled. Two wells had no gas shows and are not marked here. No SOM anomaly was found at or near either of the two dry wells.

Upper SOM Anomaly

Figure 8: Time slice through the upper SOM anomaly of Figure 7. The red line marks the location of the arbitrary line location.

Time slice 2

Figure 9: Time slice through the lower SOM
anomaly of Figure 7.

Further Work

The next step in this work is to gain a better understanding how the patterns obtained with SOM analysis relate to subsurface geometry and its rock properties. Research is currently underway in an attempt to answer questions of this kind. It is also important to further address the relationship between multi-attribute seismic properties at the wells, which correlate to rock lithologies, with those away from the wells.


Computerized information management has become an indispensable tool for organizing and presenting geophysical and geological data for seismic interpretation. Databases provide the underlying environment to achieve this goal. Machine learning is another area in which computers may one day offer an indispensable tool as well. The point is particularly germane in light of successes achieved by machine learning in other fields. The engines to help us reach this objective could well be neural networks that adapt to the data and present its various structures in a way that is meaningful to the interpreter. We believe that neural networks offer many advantages which our industry is just now recognizing.


1. Kohonen, T., 2001, Self-Organizing Maps, 3rd edition: Springer

2. Poupon, M., Azbel K. and Ingram, J., 1999, Integrating seismic facies and petro-acoustic modeling: World Oil Magazine, June, 1999

3. accessed 10 November, 2010

4. Smith, T. and Treitel, S., 2010, Self-organizing artificial neural nets for automatic anomaly identification: SEG International Convention (Denver) Extended Abstracts

5. Smith, T., 2010, Unsupervised neural networks – disruptive technology for seismic interpretation: Oil & Gas Journal, Oct. 4, 2010

Dr. Thomas SmithTHOMAS A. SMITH

Tom Smith received BS and MS degrees in Geology from Iowa State University. In 1971, he joined Chevron Geophysical as a processing geophysicist. In 1980 he left to pursue doctoral studies in Geophysics at the University of Houston. Dr. Smith founded Seismic Micro-Technology in 1984 and there led the development of the KINGDOM software suite for seismic interpretation.  In 2007, he sold the majority position in the company but retained a position on the Board of Directors.  SMT is in the process of being acquired by IHS. On completion, the SMT Board will be dissolved. IN 2008, he founded Geophysical Insights where he and several other geophysicists are developing advanced technologies for fundamental geophysical problems.

The SEG awarded Tom the SEG Enterprise Award in 2000, and in 2010, GSH awarded him the Honorary Membership Award.  Iowa State University awarded him Distinguished Alumnus Lecturer Aware in 1996 and Citation of Merit for National and International Recognition in 2002. Seismic Micro-Technology received a GSH Corporate Star Award in 2005.  In 2008, he founded Geophysical Insights to develop advanced technologies to address fundamental geophysical problems. Dr. Smith has been a member of the SEG since 1967 and is also a member of the HGS, EAGE, SIPES, AAPG, GSH, Sigma XI, SSA, and AGU.