A few years ago, we had geophysics and geology - two distinct that were well defined. Then, geoscience came along, and it was an amalgam of geology and geophysics. Many people started calling themselves geoscientists as opposed “geologist” or “geophysicist”. But the changes weren’t quite finished. Along came a qualifying adjective, and that has to do with unconventional resource development or unconventional exploration. We understand how to do exploration, but unconventional has to do with understanding shale and finding sweet spots, but it is a type of exploration. By joining unconventional and resource development, we broaden what we do as professionals. However, the mindset of unconventional geophysics is really closer to mining geophysics than it is conventional exploration.
So, today's topic has to do with the “holy grail” of machine learning in seismic interpretation. We're trying to tie this to seismic interpretation only. Even if that's a pretty big topic, we’re going to focus on a few highlights. I can't even summarize machine learning for seismic interpretation. It's already too big! Nearly every company is investigating or applying machine learning these days. So, for this talk I'm just going to have to focus on this narrow topic of machine learning in seismic interpretation and hit a few highlights.
Let's start at 50,000 feet - way up at the top. If you've been intimidated by this machine learning stuff, let's define terms. Machine learning is an engine. It's an algorithm that learns without explicit programming. That's really fundamental. What does that mean? That means an algorithm that's going to learn from the data. So, that means given one set of data, it’s going to come up with an answer, but with a different set of data, it will come up with a different answer. The whole field of artificial intelligence is broken up into strong AI and Narrow AI. Strong AI is coming up with a robot that looks and behaves like a person. Narrow AI attempts to duplicate the brain’s neurological processes that have been perfected over millions of years of biological development.
A Self-organizing map, or SOM, is a type of neural network that adjusts to training data. However, it makes no assumptions about the characteristics of the data. So, if you look at the whole field of artificial intelligence, and then we look at machine learning as a subset of that, there are two parts: unsupervised neural networks and supervised neural networks. Unsupervised is where you feed it the data and say “you go figure it out.” In supervised neural networks, you give it both the data and the right answer. Some examples of supervised neural networks would be convolutional neural networks and deep learning algorithms. Convolutional is a more classical type of a supervised neural network, where for every data sample, we know the answer. So, a data sample might be 'we have x, y, and z properties, and by the way, we know what the classification is a pri·o·ri.
A classical example of a supervised neural network would be this: Your uncle just passed away and gave you the canning operations in Cordova, Alaska. You go to the plant to see what you’ve inherited. Let’s say you've got all these people standing at a beltline manually sorting fish, and they've got buckets eels, and buckets for flounder, etc. Being a great geoscientists, you recognize this as an opportunity to apply machine learning to possibly re-assign those people to more productive tasks. As the fish come along, you weight them, you take a picture of them, you see what the scales are, general texture, you get some idea about the general shape of them. You see what I've described are three properties, or attributes. Perhaps you add more attributes and are up to four or five. Now, we have 5 attributes that define each type of fish, so in mathematical terms, we’re now dealing with a five dimensional problem. We call this ‘Attribute Space’.
Pretty soon, you run through all the eels and you get measurements for each eel. So, you get the neural network trained on eels. And then you run through all the flounder. And guess what - there's going to be variations, of course, but in attribute space, of those four or five measurements that we made for each one of type of fish are going to wind up in a different cluster in Attribute Space. And that's how we tell the difference between eels and flounder. Or whatever else you got. And everything else that you can't classify very well, goes into a bucket that is labeled ‘unclassified’. (More on this later in the presentation.) And, you put that into your algorithm. So that's basically the difference between supervised neural networks and unsupervised neural networks. Deep learning is a category of neural networks that can operate in both supervised and unsupervised discovery.
Now, before we get deeper into our subject today, I'd like to draw your attention to some of the terms: the concept of Big Data. If you remember a few years ago, if you wanted to survive in the oil and gas business, finding large fields was the objective. Well, we have another big thing today - Big Data. Our industry is looking at ways to apply the concepts of Big Data analytics. We hear senior management of E&P companies talking about Big Data and launching Data Analytics teams. So, what is Big Data or Data Analytics? It's access to large volumes of disparate kinds of oil and gas data that is analyzed by machine learning algorithms to discover unknown relationships, those that were not identified previously. The other key point about Big Data is that it is disparate kinds. So the fact is you say "I'm doing Big Data analytics with my seismic data" - that's not really an appropriate choice of terms. If you say "I'm going to throw in all my seismic data, along with associated wells, and my production data" - now you're starting to talk about real Big Data operations. And, the opportunities are huge.
Finally, there’s IoT – Internet of Things - which you’ve probably heard or read. I predict that IoT will have a larger impact on our industry than machine learning, however, the two are related. And why is that? Almost EVERYTHING we use can be wired to the internet. In seismic acquisition, for instance, we're looking at smart geophones being hooked up that sense the direction of the boat and can send and receive data. In fact, when the geophones get planted, they have a GPS in each one of those things so that when it’s pulled up and thrown in the back of a pickup truck, the geophones can report their location in real-time. There are countless other examples of how IoT will change our industry.
Let’s consider wirelines as a starting point of interpretation and figuring out the deposition of the environment using wireline classifications. If we pick a horizon, and based on that auto-picked horizon, we have a wavelet at every bin. We pull that wavelet out. In this auto-picked horizon, we may have a million samples and we have a million wavelets because we have a wavelet for each sample. (Some early neural learning tools were based on this concept of classifying wavelets.) Using these different classes, machine learning analyzes and trains on those million wavelets, finding say seven most significantly different. And the we go back and classify all of them. And so we have this cut shown here, across the channel and the wavelet, closest to the center, discovered to be tied to that channel. So there’s the channel wavelet, and now we have overbank wavelets, some splay wavelets – several different wavelets. And from this, a nice colormap can be produced indicating the type of wavelet.
Horizon attributes look at the properties of the wavelet along the vicinity of the horizon, at say frequency of 25 to 80 hertz with attributes like instantaneous phase. So we know have a collection of information about that pic using horizon attributes. Using volume attributes, we’ll look at a pair of horizons and integrate the seismic attributes between the horizons. This will result in a number, such as the average amplitude or average envelope value, that represents a sum of seismic samples in a time or depth interval. However, when considering machine learning, the method of analysis is fundamentally different. We have one seismic sample and associated with that sample we have multiple seismic attributes associated with that sample. This produces a multi-attribute sample vector that is the subject of the machine learning process.
Ok, so let's take a look at some of the results: This is a self-organizing map, analysis of a wedge using only 2 attributes. We've got three cases - low, medium, and high levels of noise, and in the box over here you can see tuning thickness is right here, and everything to the right of that arrow is below tuning. Now, the SOM is multi-attribute samples. And in this case, we are keeping things very simple since we only have two attributes.
If you have only two attributes, you can plot them on a piece of paper - x axis, y axis. However, the classification process works just fine for two dimensions or twenty dimensions. It's a machine learning algorithm. In two dimensions, we can look at it and decide "did it do a good job or did it not?" For this example, we’ve used the amplitude and the Hilbert Transform because we know they're orthogonal to each other. We can plot those as individual points on paper. Every sample is a point on that scatter plot.
However, if we put it through a SOM analysis, the first stage is SOM training, which is trying to locate natural clusters in attribute space, and then the second phase is once those neurons have gone through the training process, we then take the results out and classify ALL the samples. So, we have here the results - every single sample is classified. Low noise, medium noise, high noise, and here are the results here. If you go to tuning thickness, we are tracking with SOM analysis events way below tuning thickness. And the fact that there's the top of the wedge or ... this one right here is where things get below tuning thickness. Eventually tip the corresponding trace right over there.
Now, there's a certain bias. We are using here for this analysis a two-dimensional topology - it's two dimensions, but also the connectivity is hexagonal connectivity between these neurons, which is made use of during the training process. And there's a certain bias here because this is a smooth colormap. By the way, these are colormaps as opposed to colorbars. Right? Color maps, not colorbars. In terms of color MAPS, you can have four points of connectivity, and then it's just like a grid. Or 6 points of connectivity, and then it's hexagonal. That helps us understand the training that was used. Well, there's a certain bias about having smooth colors and we have attempted in this process here - there's 8 rows and 8 columns - every single one of those has gone looking for a natural cluster in attribute space. Although it's only two dimensions, they are still is a hunting process. Each of these 64 neurons, after the training process, are trying to zero in on a natural cluster. And there's a certain bias here in using smooth colors because that happens like yellow and greens and here's blues and reds. Here's a random color - and you can see the results. But even if we use random colors, we are still tracking events way below tuning thickness using the SOM classification.
We are demonstrating the resolution well below tuning. There's no magic. We use only two attributes - the real part and the imaginary part, which is the Hilbert Transform, and we are demonstrating the SOM characteristics of training using only two attributes.
The self-organizing map, SOM, training algorithm is modeled on discovering of natural clusters in attribute space, using training rules based upon the human visual cortex. Conceptually, this is a simple but powerful idea. We can see examples in nature of simple rules that lead to profound results.
So, the whole idea behind self-organizing assemblages is the following:
Snow geese and fish are both examples of self-organizing assemblages. Individuals follow a simple rule. The individual goose is just basically following a very simple rule: Follow the goose in front of me, just a few feet behind and either left or right. It's a simple as that. That's an example of self-organizing assemblage, but yet some of the properties of that are pretty profound, because once they get up to altitude, they can go for a long time and long distances using the slipstream properties of that “v” formation. The basic rule for a schooling fish is 'swim close to your buddies. Not so close that you’ll bump into them, and not so far away that it doesn't get represented as a school of fish.’ When the shark swims by, the school needs to look like one big fish. If those individual fish were too far apart, the shark would see the smaller isolated fish as easy prey. So, there's even a simple rule here of a optimum distance one to the other.
These are just two examples of where simple rules produce complex results when applied at scale.
Unsupervised neural networks work, which classify the data, also work on simple rules but operating on large volumes of seismic samples in attribute space.
The first example is the Eagle Ford case study. Patricia Santagrossi published these results last year. This is a 3D survey of a little over 200 square miles. The SOM analysis was run between the Buda and the Austin Chalk and the Eagle Ford is right above the Buda in this little region right there. The Eagle Ford shale layer was 108' thick, which is only 14 ms.
Now both the Buda and Austin Chalk are known , strong peak events. So, if you count how many cycles we go through here, peak trough, kind of a doublet, trough, peak. The good stuff here is basically all the bed from one peak to one trough. Conventional seismic data. Here's the Eagle Ford shale as measured right at the Buda break well there. We have both a horizontal and a vertical well right here. And that trough is associated with the Eagle Ford Shale. That trough and that peak. So, this is the SOM result with an 8x8 set of neurons that are used for the training. Look at the visible amount of detail here. Not just between the Buda and the Austin Chalk, but actually you can see how things are changing, even along the formation here, within the run of the horizontal well. Because every change in color here corresponds to a change in neuron.
These results were computed by machine learning using seismic attributes alone. We did not tie the results to any of the wells. The SOM analysis was run on seismic samples with multiple attributes values. The key idea here is simultaneous multi-attribute analysis using machine learning. Now, let’s look further at this Eagle Ford case study.
These are results computed by machine learning using seismic attributes. We did not skew the results and tie them to any of the wells. They were not forced to fit the wells or anything else. The SOM analysis was run strictly on seismic data and the multi-attribute seismic samples. Again , the right term is simultaneous multi-attribute analysis. Multi-attribute, meaning it's a vector. In our analysis every single sample is being used simultaneously to classify the data - a solution. So although this area is 200 square miles from an aerial view, between the Buda and the Austin Chalk, we’re looking at every single sample – not just wavelets. By simple inspection, we can see that the results corroborate the results of applying machine learning with the well logs, but there has been no force fitting of the data.
These arrows are referring to the SOM winning neurons. If we look at detail, here is Well #8, a vertical well in the Eagle Ford shale. The high resistivity zone is right in here. That could be tied into the red stuff. So, here again we're dealing with seismic data on a sample-by-sample basis.
The SOM winning neurons identified 24 geobodies, autopicked in 150 feet of vertical section at this well on #8 in the Eagle Ford borehole. Some of the geobodies - not all of them - some of them track the underwells and went over the entire 200 sq. mile 3D survey.
This is to zero in a little bit more . So I can give you some association here. This is the high resistivity zone is correlating with winning neuron 54, 60, and 53 in this zone right in here. There's the Eagle Ford Ash that is identified with neurons 63 and 64. And Patricia even found to tie in with this marker right here - this is neuron 55.
And this well, by the way, well #8, was 372 Mboe. SOM classification neurons are associated with specific wireline lithofacies units. That's really hard to argue against.
We have evidence, in this case up here for example, of an unconformity where we lost a neuron right through here and then we picked it up again over there. And, there is evidence in the Marl of slumping of some kind. So, we're starting to understand what's happening geologically using machine learning. We’re seeing finer detail - more than we would have using conventional seismic data and single attributes.
Tricia found a generalized cross section of Cretaceous in Texas, northwest / southeast towards the gulf. Eagle Ford shale fits in here below the Marl and there's an unconformity between those two - she was able to see some evidence of that.
The well that we just looked at was well #8, and it ties in with the winning neuron. Let’s take a look at another well, say for example, well #3, a vertical well with some x-ray diffraction associated with it. We can truly nail this stuff with the real lithology, so not only do we have a wireline result, but we also have X-ray diffraction results to corroborate the classification results.
So, of the 64 neurons, over 41,000 were classified as "the good stuff".. Not on a sample basis, so you can integrate that - you can tally all that stuff up and start to come up with estimates.
So, specific geobodies relate to winning neuron that we're tracking - #12 - that's the bottom line. And from that we were able to develop a whole Wheeler diagram for the Eagle Ford group for the survey. And the good stuff are the winning neurons 58 and 57. They end up on the neuron topology map here, so those two were associated with the wireline lithofacies footstep - the high resistivity part of the Eagle Ford shale. But she was able to work out additional things, such as more clastics and carbonates in the west and clastics in the southeast. And, she was able to work out not only Debris Apron, but the ashy beds and how they tie in. Altogether, these were the neurons associated with the Eagle Ford shale. These were the neurons - 1, 9, and 10, that's the basal clay shale. And the Marls were associated with these neurons.
So, the autopicked geobodies, across the survey on the basis we're developing the depositional environment of the Eagle Ford that compare favorable with the well logs. Using seismic data alone, one of our associates received feedback to the effect that "seismic is only good in conventionals, just for the big structural picture." Man, what a sad conclusion that is. There's a heck of a lot more out of this high resistivity zone pay that was associated with two specific neurons, demonstrating that this machine learning technology is equally applicable to unconventionals.
The second case study here is the Gulf of Mexico, by my distinguished associate, Mr. Rocky Roden. This is not deepwater – only approximately 300 feet. Here's a north fault amplitude buildup. Here, these are time contours and the amplitude conformance to structure is pretty good. In this crossline - 3183 - going from west to east is the distribution of the histogram of the values. You can see here in the dotted portion, this is just the amplitude display, and the box right here is a blowup of the edge right there of that reservoir. What you can see here is the SOM classification using colors. Red is associated with the gas-over-oil contact and oil-over-water contact. A single sample.
So here we have the use of machine learning to help us find fluid contacts, which are very difficult to see. This is all without having bandwidth, frequency range, point source, point receivers – it isn’t a case of everything dialed in just the right way. The rest of the story is just the use of machine learning. However, it’s machine learning on not just samples of single numbers, but each sample as a combination of attributes; as a vector.
Using that choice of attributes, we’re able to identify fluid contacts. For easier viewing, we make all these others transparent and only show those that you can see visually here of what has been estimated using the classifier of the fluid contacts and also the hills. In addition, look at the edges. The ability to define the edge of the reservoirs and come up with volumetrics, is pretty clear to be superior. Over here on the left, Rocky's taken the “goodness of fit”, which is an estimate of the probability of how well each of these samples fits the winning neuron, and by lowering the probability limit, and saying "I just want to look at the anomalies", that edge of the amplitude conformance of structure, I think is clearly better than what you would have using amplitude alone.
So, new machine learning technology stuff using simultaneous multi-attributes is resolving much finer reservoir detail than we've had in the past, and the geobodies that fit the reservoirs are revealed in the details, frankly, previously not available.
In general, this is what our “Earth to Earth” model looks like. If we start here with the 3D survey, and then from the 3D survey, we decide on a set of attributes. We take all our samples, which are vectors because of our choice of attributes, and then literally, plot them in attribute space. If you've 5 attributes, it's 5-dimensional space. If you have 8 attributes, it's 8-dimensional space. And your choice of attributes is going to illuminate different properties of the reservoir. So, the choice of attributes that Rocky used helped to zero in on those fluid contacts, would not be the ones he would use to illuminate the volume properties or the absorption properties, for example. Once the attribute volumes is in attribute space, we use a machine learning classifier to analyze and look for natural clusters of information in attribute space. Once those are classified in attribute space, the results then, are presented back in a virtual model, if you will, of the earth itself. So, our job here is our picking geobodies, some of which have geologic significance and some of which don’t.
The real power is in the natural clusters of information in attribute space. If you have a channel and you've got the attributes selected to illuminate channel properties, then, every single point that is associated with the channel, no matter where it is, is going to concentrate in the same place in attribute space. Natural clusters of information in attribute space are all stacking. The neurons are hunting, looking for natural clusters, or higher density, in attribute space. They do this using very simple rules. The mathematics behind this process were published by us in the November 2015 edition of the Interpretation journal, so if you would like to dig into the details, I invite you to read that paper, which is available on our website.
Two keys are:
1.Attribute selection list. Think about your choice of attributes as an illumination function. What you are trying to do with your choice of attributes is an illumination function of the real geobodies in the earth and how they end up as natural clusters in attribute space. And that's the key.
2.Neurons search for clusters of information in attribute space. Remember the movie, The Matrix? The humans had to be still and hide from the machines that went crazy and hunted them. That’s not too unlike what’s going on in attribute space. It's like The Matrix because the data samples themselves don't move. They're just waiting there. It’s the neurons that are running around in attribute space, looking for clusters of information. The natural cluster is an image of one or more geobodies in the earth, but it's been illuminated in attribute space, totally depending on the illumination list. It stacks in common place in attributes - that's the key.
Seismic stratigraphy is broken up into two levels here: first is seismic sequence analysis where you look at your seismic data and you organize it and break it up in to packets of concordant reflections. It's pretty straightforward stuff - chaotic depositional patterns. And then after you have developed a sequence analysis, you can categorize the different sequences. You have a facies analysis trying to infer the depositional setting. Is the sea level rising? Is it falling? Is it stationary? All this naturally falls in because the seismic reflections are revealing geology on a very broad basis.
Well, the attribute - it's hunting geobodies as well. Multi-attribute geobodies are also components of seismic stratigraphy. We define it this way: a simple geobody has been auto-picked by machine learning in attribute space. That's all it is - we're defining a simple geobody. We all know how to run an auto-picker. In 15 minutes, you can be taught how to run an auto-picker in attribute space. Complex geobodies are interpreted by you and I. We look at the simple geobodies and we composite those just the way we saw in that wheeler diagram. We combine those to make complex geobodies. We give it a name, some kind of texture, some kind of surface - all those things are interpreted geobodies and the construction of these complex geobodies can be sped up by some geologic rule-making.
Now the mathematical foundation we published in 2015 ties this altogether pretty nicely. You see, machine learning isn't magic. It depends on the noise level of the seismic data. Random noise broadens natural clusters in attribute space. What that means then, is that we're attenuating noise so optimum acquisition and data processing, delivering natural clusters with the greatest separation. In other words, nice, tight clusters in attribute space will be much easier for the machine learning algorithm to identify when you have nice, clean identification and separation. So, acquisition and data processing matters.
However, this isn’t talking about coherent noise. Coherent noise is something else. Because with coherent noise, you may have an acquisition footprint, but that forms a cluster in attribute space and one of those neurons is going to go after that just as well because it's an increase in information density in attribute space and voila - you have a handful of neurons that are associated with an acquisition footprint. Coherent noise can be deducted by the classification process where the processor has merged two surveys.
Second thing: Better wavelet processing leads to narrower, natural clusters, more compact natural clusters leads to better geobody resolution because geobodies are derived from natural clusters.
Last but not least, larger neural networks produce greater geobody details. You run a 6x6, an 8x8 and a 10x10 2D Colormaps, you eventually get to the point where you're just swamped with details and you just can't figure this thing out. We see that again and again. So, it’s better to look at the situation from 40K feet, and then 20, and then 10. Usually, we just go ahead and run all three SOM runs all at once to get them all done and to examine them in increasing levels of detail.
I'd like to now switch gears on something entirely different. Put the SOM box here aside for a minute, and let’s revisit the work Rocky Roden did in the Gulf of Mexico . Rocky came up with an important way of thinking about the application of this new tool.
In terms of using multi-attribute seismic interpretation - think of it as a process and what's really important is starting with the geologic question of what you want to answer. For example: we're trying to illuminate channels. Ok, so there are a certain set of attributes that would be good. So, what we have then here is, ask the question first. Firmly have that in your mind for this multi-attribute seismic interpretation process.
There's a certain set of attributes for the geologic question, and the terminology for that set is the "attribute selection list". When you do an interpretation like this, you really need to be aware of the current attributes being used when looking at the data. Depending on the question, we then take the discipline and we say "well, if this is the question you're asking", this attribute selection list is appropriate. Remember, the attribute selection list is an illumination function.
Once you have the geologic process, the next step is the attribute selection list, and then classify simple geobodies, which is auto-picking your data in attribute space and looking at the results.
Now, this just doesn't happen in back and it just doesn't happen at once - it's an iterative process. So, interpreting complex geobodies is basically more than one SOM run, and more than one geologic question. And interpreting these results at different levels - how many neurons, that sort of thing, this is a whole seismic interpretation process. Interpreting these complex geobodies is the next step.
We’re looking at results and constructing geologic models. Decide which is the final geologic model, and then our last step is making property predictions.
So, in the world of multiple geologic models, or multiple statistic models, it really doesn't make any difference. We select the model, we test the model, we select a bunch of models, we test those models, and we choose one! Why? Because we want to make some predictions. There's got to be one final model that we decide on as professional that this is most reliable and we're going to use it. Whether it's exploration, exploitation, or even appraisal, same methodology - it's all the same for geologic models and statistical models.
The point here boils down to something pretty fundamental. As exploration geophysicists, we're in the business of prediction. That's our business. The boss wants to know "where do you want to drill, and how deep? And what should we expect on the way down? Do we have any surprises here?” They want answers! And we're in the business of prediction.
So how good you are as a geoscientist depends, fundamentally, on how good are your predictions of your final model? That's what we do. Whether you want to think about it like that or not, that's really the bottom line.
So this is really about model building for multi-attribute interpretation - that's the first step. Then we're going to test the model and choose the model. Ok, so, should that model-building be shipped out as a data processing project? Or through our geo-processing people? Or is that really something that should be part of interpretation? Do you really trust that the right models have been built from geoprocessing? Maybe. Maybe not. If it takes 3 months, you sure hope you have the right model from a data processing company. And foolish, foolish, foolish if you think there's only one run. That's really dangerous. That's a kiss and a prayer, and oh, after three months, this is what you're going to build your model on.
So, as an aside, if you decide that building models is a data processing job, where's the spontaneity? And I ask you - where's the sense of adventure? Where's the sense of the hunt? That's what interpretation is all about - the hunt. Do you trust that the right questions have been asked before the models are built? And my final point here is that there are hundred's of reasons just to follow procedure. Stay on the path and follow procedure. Unfortunately, nobody wants to argue. The truth here is what we're looking for. And truth, invariably - that path has twists and turns. That's exploration. That's what we're doing here. That's fun stuff. That's what keeps our juices going... about finding those little twists and turns and zeroing in on finding truth.
Now model testing and final selection have begun when models are built and you decide which is the right one. For example, you generate 3 SOMs – an 8x8, 12x12, 4x4, and you look at results and the boss says "ok, you've been monkeying around long enough, what's the answer? Give me the answer"... “Well...hmm...” you respond. “I like this one. I think 8x8 is the right one." Now, you could do that, but you might not want to admit it to the boss! One quantitative way of comparing models would be to look at your residual errors. The only trouble with that is it's not very robust. However, a quantitative assessment - comparing models - is a good way to go.
So, there is a better methodology - better than just comparing residual errors - this is a whole field of cross-validation methodologies. Not going to go into that stuff right here, but some cross-validation tools: bootstrapping, bagging, and even Bayesian statistics are helpful tools in helping us prepare models and helping us figure out the model that is robust and in the face of new data is going to give us a good strong answer - NOT the answer that fits the data the best.
Think about the old problem of fitting a least squares line through some data. You write your algorithm in python or whatever tool, and it kind of fits through the data, and the boss goes "I don't know why you're monkeying around with lines. I think this is an exponential curve because this is production data.” So, you make an exponential curve. Now, this business of cross-validation, think about this: fitting a polynomial to the data: two terms, a line, three terms, a parabola, four terms ... until n... we could make n equal 15 and by golly there's no possibility of error - we crank that thing down. The trouble is, we have over-fit the data. It fits this data perfect, but some new data comes in and it's a terrible model because the errors are going to be really high. It's not robust. So, this whole comes up to cross validation methodology is really very important. The future here is, “who's going to be making the prediction - you, or the machine?” I maintain to make good decisions, it's going to be us! We're the ones that will be making the right characteristics - because we’ll leverage machine learning.
Let's take a look at Machine Learning. Our company vision is the following:
“There's no reason why we cannot expect to query our seismic data for information with learning machines, just as effortlessly and with as much reliability as we query the web for the nearest gas station.”
Now this statement of where our company is going is not a statement of "get rid of the interpreters“. It's a statement, in my way of thinking, and in all of us at our operations, it’s a statement of a way forward. Because truly, this use of machine learning is a whole new way of doing seismic interpretation. It's using it as a tool - it's not replacing anybody. Deep learning, which is an important for seismic evaluation, might be a holy grail, but its roots are in image processing, not in the physics of wave motion. Be very careful with that.
Image processing is very good at telling the difference between Glen and me from that have pictures of us. Or if you have kitties and you have little doggies, image processing can classify those, even right down to those that you're not real certain whether it’s a dog or cat. So, deep learning is focused on image processing and also on the subtle distinctions between what is the essence of a dog and what is the essence of a cat, irrespective of whether the cat is laying there or standing there or climbing up a tree. That's the real power of this sort of thing.
Here's a comparison of SOM and Deep Learning in terms of all of its properties, and there's good and bad things about each one of these. There's no magic about any one of these. Not to say one's better than the other.
I would like to point out that unsupervised machine learning trains by discovering natural clusters in attribute space. Once those natural clusters have been identified in attribute space, attribute space is carved up and say any samples to this region right in here in attribute space corresponds to this winning neuron and over here is that winning neuron. Your data is auto-picked and put back in 3-dimensional space in a virtual 3D survey. That's the essence of what's available today.
Supervised machine learning trains on patters that it discovers on amplitude data alone. Now there are two deep learning algorithms that are popular today. One's called Convolutional Neural Network, which learns by visual patterns, faces, sometimes called eigenfaces, uses PCA. And then there's fully convolutional networks, which are using sample size patches and full connections between the network layers.
Here's a little cartoon showing you this business about layers. This is the picture and trying to identify the little features of this, you can't say that this is a robot, as opposed to a cat or a dog, until it goes through this analysis. Using patching and features maps, using different features for different things, it goes from one patch to the next to the next, until finally - your outputs here -well, it must be robot, dog, or kitty. It's a classifier using the properties it has discovered in a single image. The algorithm has discovered its own attributes. You might say "that's pretty cool". And indeed it is, butits only using the information seen in that picture. So, it's association - it's the texture features of that image.
Here's an example from one of our associates - Tao Zhao - he's been working in the area of full convolutional networks. This example is where he's done some training - training lines A - clinoforms here, chaotic deposition here, maybe some salt down there, and then some concordant reflections up top. Here's an example of the results of the FCN. And then here is the classification of salt down here. So, the displays here are examples of full convolutional networks.
One final point and then I'll sit down: Data is more important than the algorithms. The training rules are very simple. Remember the snow geese? Remember the fish? If you were a fish or if you were a snow goose, the rules are pretty simple. There's a fanny - I'm gonna be about 3 feet behind it, and I'm not gonna be right behind the snow goose ahead of me - I want to be either to the left or the right. Simple rule. You're a fish, you want to have another fish around you of a certain distance. Simple rules. What's important here is data is more important than the algorithms.
Here is an example taken from E&P Magazine this month (January). Several years this company called Solution Seekers has been training on production data using a variety of different data and looking for patterns to develop best practice drilling recommendations. Kind of a cool big-picture kind of a concept.
So machine learning training rules are simple - the real value is the classification of results it's the data the builds the complexity. My question to you is: Does this really address the right questions? If it does, extremely valuable stuff. If it misses the direction of where we're going - the geologic question - it's not that useful.
So, here are your takeaways. What does machine learning bring to seismic interpretation? It brings patterns that are previously unattainable. It’s working in an attribute space far higher than we can operate in. 3 spaces is ok... maybe we can make it to four because we can color up our points. Ok, so that's four... but when you really get right down to it, you have a whole 3D survey, you've got 20 million sample points, each one is a vector... let this machine try to figure this stuff out.
So, patterns are what machine learning helps us with in our seismic interpretation. And on the second thing here: the holy grail, if there is a holy grail - two things: It's a new way to conduct seismic interpretation. This is a wave that's on the way. I can state that with great certainty. How can I state that with great certainty? Because your boss thinks so. All the bosses have bought into this - data mining, deep learning - gotta have some of that. What is our geologist and geophysicist staff doing? Aren't they using any of that stuff? They should be down there trying to find oil and gas and discovering relationships that they've never seen before.
Let's just be careful about this sort of thing because our data is fundamentally different from pictures of kitties and dogs and newborn babies. The web is filled with free data that's already been classified. Now, we're feeling around data where we're just learning the properties of those natural clusters. Where we stand today, our understanding of our seismic data and multi-attributes is going to be very primitive compared to what we’ll have two years from now. We'll have a whole lot better appreciation about what to be looking for. And supervised neural networks are going to make a whole lot more sense. So today, unsupervised machine learning of multi-attribute seismic samples is the new way of doing things - another tool to do interpretation. Tomorrow probably will be deep learning in one form or another.
Second point - it's a more professional discipline about our business, in terms of thinking about building models, assessing risk of the models, and figuring out with particular questions which is the best model, and then making predictions. This whole process of model building, then choosing a model, and then ultimately making a prediction. It leads to a much more by developing the center of what we do in our business. Because we clearly are in the business of predicting. Thank you.