Abstract:
Over the last few years, because of the increase in low cost computer power, individuals and companies have stepped up investigations into the use of machine learning in many areas of E&P. For the geosciences, the emphasis has been in reservoir characterization, seismic data processing and most recently, interpretation.
By using statistical tools such as Attribute Selection, which uses Principal Component Analysis (PCA), and Multi-Attribute Classification using Self Organizing Maps (SOM), a multi-attribute 3D seismic volume can be “classified.” PCA reduces a large set of seismic attributes to those that are the most meaningful. The output of the PCA serves as the input to the SOM, a form of unsupervised neural network, which when combined with a 2D color map facilitates the identification of clustering within the data volume.
The application of SOM and PCA in Paradise will be highlighted through a case study of the Niobrara unconventional reservoir. 100 square miles from Phase 5 of Geophysical Pursuit, Inc. and Fairfield Geotechnologies’ multiclient library were analyzed for stratigraphic resolution of the Niobrara chalk reservoirs within a 60 millisecond two-way time window. Thirty wells from the COGCC public database were available to corroborate log data to the SOM results. Several SOM topologies were generated and extracted within Paradise at well locations. These were exported and run through a statistical analysis program to visualize the neuron to reservoir correlations via histograms. Chi2 squared independence tests also validated a relationship between SOM neuron numbers and the presence of reservoir for all chalk benches within the Niobrara.
Carrie Laudon
Transcript
Carrie Laudon:
Good afternoon. Those of you… I’m not all that used to talking into my screen, but thank you for attending my talk today.
I am presenting a case study for an unconventional reservoir from the US which illustrates how self-organizing maps can push the limit of seismic resolution to single sample and how we can take these results and use our Wells to validate the machine learning results.
So, the outline of my talk today. We’re going to start with a little bit of discussion around, why use machine learning? What does it bring to us that we didn’t have previously? I’ll take you through some examples of how attributes, sampling and self-organizing maps can resolve geology below tuning. Then, an introduction to the Niobrara petroleum system in the Denver Julesburg basin. And follow that with our multi attribute machine learning interpretation thought flow which is, comprised of two pieces, the principal component analysis followed by self-organizing maps. Then, present the results of SOM on Niobrara seismic survey. We’ll look at those results visually, and then look at how I took those results at the well locations and calibrated our SOM, and then finish up with some well to SOM statistics.
There we go. Why do we want to consider seismic multi attribute analysis? The seismic attributes have been around for decades and they’ve always been very challenging to interpret. Because, they’re respondent to a complex subsurface that’s a combination of lithology, fluids, pressure, stress, faults, and fractures. And there’s never really a single attribute that tells us this is it. We can interpret this one attribute. So, we’ve always been trying to interpret multiple attributes.
And what does machine learning bring to our task of interpreting seismic attributes? It helps us to address the overwhelming task of interpreting dozens and hundreds of attributes. So, seismic data has always been a big data problem in our industry and machine learning addresses the human inability to visualize attribute relationships beyond four dimensions. As humans, we can easily 2D or 3D cross plots, and we can add color. But beyond that, we cannot see relationships in four dimensions. However, a computer does not have that limitation.
And then, over the last few years, computer power continues to evolve exponentially and new visualization techniques. For example, in Paradise, we have a 2D color map. So there’re additional visualization techniques that have come to the forefront as well. But ultimately, our motivation in considering multi attribute analysis is to produce better sub-surface interpretations reducing our risk, and hopefully for our clients, allowing them to make money, whether that’s drilling better wells or leasing better acreage.
So, today we’re going to look at unsupervised learning. But, if you were to Google machine learning, you’d say these four categories, are four main categories of machine learning. Let’s take a brief stroll through these. Supervised learning is really the most popular method in our industry. It takes a set of known input data and trains a model to apply new data to predict a result. An advantage of supervised learning is that you get directly to a specific answer. For example, you might get a porosity model from pre-stack inversion and that’s been trained to well data. Unsupervised learning, however is quite different. And it does have its own advantages.
Unsupervised learning has no a priori knowledge of the data it’s organizing. The training adapts itself directly from the data. An advantage to this is it finds patterns in the data that might not exist in the training data that’s used for supervised learning. We see this quite often in our unsupervised machine learning projects. Toward the end of the talk, I can point out some of the reasons for this.
Supervised learning can be thought of as a task driven approach. And it’s predicting the next value from a set of inputs. Whereas unsupervised learning is data-driven and is a means of finding patterns within the data. Then our other two types are semi-supervised learning, which is a simple combination of the two. And lastly, reinforcement learning, which is when an algorithm learns over time to maximize returns based on rewards it receives for performing certain actions.
So, the case study today is it’s going to be on unsupervised learning.
The next section is going to go through a little bit of how and why self-organizing maps can help us discriminate or resolve thin beds. SOM classification is simply a method that uses attribute space to classify seismic samples. In this example, we just have a cross plot of two attributes and they’ve already been marked with their symbols into which cluster they fall into. The SOM process simply introduces natural clusters and attribute space and introduces multi attribute samples called neurons. And the SOM neurons will seek out the natural clusters of like characteristics in the seismic data. And the neurons learn the characteristics of the data cluster through an iterative process of cooperative followed by competitive training. And when the learning is completed, each unique cluster is assigned to a neuron number and the samples are now classified.
Another important concept in Paradise is that we use sample base attributes. So, every sample from every attribute is put into the SOM. We don’t use a wavelet based approach. This example is just showing you how instantaneous attributes at the sample skill look for about… I think this is about a hundred milliseconds of data in time. In attributes space, this would consist of seven dimensions.
This is going to take you through an synthetic example that is really, I think, a powerful illustration of thin bed discrimination by self-organizing maps. In this model, we have an RC series on the left. I believe our two millisecond sample, right? We have a sort of thick positive acoustic impedance followed by a single sample trough peak, each one sample, a peak trough, each one sample, and then a slightly thicker trough.
This RC series was convolved with a wavelet to create traces. Yeah. And then, these were duplicated to a hundred traces with noise added. Here on the left, we simply have cross flooded the Hilbert transform versus the regular amplitude. And you can see that clusters are very apparent in this simple cross plot of the data. Now, if we run this through an 8×8 SOM process, you can see each of these clusters is assigned to one of these 64 neurons in the SOM 2D color map.
Now taking this back to our original time series, you can see in the example that each of our events in our original RC series has formed a unique cluster within the SOM. And so, cluster neuro cluster neuron 57 is our sort of thick high acoustic impedance layer. It’s actually separated the negative amplitude from the positive amplitude in 8 and 55. So, there’s one doublet pair. And then 28 and 33 are a second doublet pair. And then, our other higher acoustic impedance layer number 9. I hope this convinces you, that the sound process is able to distinguish very thin layers within a seismic time sequence.
In practical, how does this work in a 3D cube? I’ll just take you through an imaginary 10 attribute example. We have a 3D amplitude volume in survey space. And if we were to compute or select 10 attributes to go into SOM. We might’ve started with 20 or 30 attributes, but we selected SOM and each sample in the 3D cube. We’ll also have 10 values associated in X, Y and time, in attributes. And, we take those attributes into surveys, into attribute space. Now we’re in 10 dimensional space and we run that through the SOM classification process. The SOM will perform cooperative and competitive training. And in this case classify to 64 patterns. Now that’s not set in stone. That’s up to the interpreter. We offer a very large number of potential SOM typologies, 64 is one of our favorites.
Once the data are classified and attributes space, it goes through a nonlinear process to map those SOM neurons back to a 2D color map. And then, each of these neurons is also placed back into the 3D volumes. So, you end up with a 3D volume back in survey space where each sample is classified as one of these neurons. And then, once you have that in a 3D volume, you can start to interrogate it and interpret it.
That’s the SOM process in a nutshell. I guess, I said that. I forgot to put the words up.
Okay. Now we’re going to look at an introduction to the case study. Our case study is out of the Denver-Julesburg basin. The Wattenberg field… Here’s Denver, for reference. The Wattenberg field was produced as a basin centered conventional gas reservoir down in these Dakota sands. Our study areas up here in the Northern part, almost near the Wyoming border.
This is an asymmetric forelimb basin, covers approximately 70,000 square miles over parts of Colorado, Wyoming, Kansas, and Nebraska. The basin has over 47,000 oil and gas wells with a production history dating back to 1881. However, until 2009, it really wasn’t being drilled with highly deviated wells there. Most of the Wells in the basin at that point were vertical. They were drilling without seismic. Starting around 2009, 11 years ago, the operators in Wattenberg began to drill and complete horizontal Wells within the Niobrara. And we have a cartoon here showing the strat column. All of these are pay zones. The shales that are source rocks are shown here, and the Niobrara is kind of blowing up. Typical depth… I should mention this is from Sonnenburg, a professor at Colorado school of mines taken from one of his publications.
There are these chalk benches within the Niobrara and they’re informally termed the A,B and C bench. Those are the primary reservoirs for the horizontal drilling. They’re interbedded with a very rich source rocks. However, when the horizontal drilling started, they found they were hitting a lot of structures and faults that they weren’t seeing in the vertical wells. So they quickly recognize the need for 3D seismic data.
Beginning around 2015 or so, maybe 2010, sorry. GPI and at the time Geokinetics undertook a 1500 square mile multi-client program in Northern Colorado, the outline of which you can see here. They provided you physical insights with a hundred square miles of this multi-client data to do a proof of concept on whether our self-organizing maps approach could improve the resolution of the Niobrara reservoirs. In addition to the seismic data, we had about 30 wells in the area, the hundred square miles that were available through the public COGCC database. And out of these, we selected seven, which are shown in red to have a full petrophysical analysis run on.
The time structure map is shown here north on the left for the hundred square miles. And we’re showing the production is very closely tied to fracturing in the chalks. As you can imagine, it’s a very low porosity reservoir and the insets here are structural elements. This was from a study of the Austin chalk showing the kind of fracture patterns to expect for various structural elements, most of which we have present in our survey. Here to the west, you can see you’re coming up on a fairly steep fold as you’re coming up to the front range of the Rocky mountains. The predominant faulting direction is, you can see from the time structure mat Northeast to Southwest. But, that’s also overprinted with the Northwest to Southeast fault pattern.
This is on the top of the Niobrara. This is the most positive curvature. So, you can see a lot more detail on the complexity of the fault geometries and the two main fabrics over printing each other and how these would actually generate nice fracture sets within the chalks. These attributes are computed from the SB consortium library that is provided with Paradise.
Here’s our original data, and it’s very nice seismic data, very clean. We selected approximately 60 milliseconds. The vertical scale on the left here is to a time in seconds and trace spacing is about 110 feet. The seismic data we down sampled once from two milliseconds to one millisecond. The Niobrara is a fairly easy event to follow. The top Niobrara is a strong peak shown here. The base of our study section was the top of the green horn.
Within the 60 milliseconds, we have our four reservoirs that the benches of the Niobrara, as well as the codell sandstone, which I forgot to mention in the last slide. But, the codell is a sandstone. That’s also being produced from these horizontal wells. And it’s a fairly thin, highly heterogeneous sandstone that is overlaid unconformably by the Niobrara formation. We have a strong trailing trough that sits between the A bench and the B bench. Another thing to notice is how much more internal faulting there is within the Niobrara that you don’t necessarily pick up. If you’re just looking at that the very shaley part near the top of the Niobrara.
We’re looking to see whether the SOM process can give us more detail on these four reservoirs within the 60 millisecond interval. Selecting the attributes that go into SOM is another very important step in the analysis, because as I mentioned earlier, we all have maybe our favorite attributes. But, it’s hard to put a direct physical interpretation onto a single seismic attribute. How do we decide which attributes to run into our SOM process? For that, we use principal component analysis. Principal component analysis is another unsupervised machine learning technique that helps us with dimensionality reduction. It’s a technique that will take a large set of variables and reduce it to a smaller set that will still contain most of the variation of independent information from the large set. So, PCA helps us find the most prominent and important seismic attributes in our data.
Paradise allows us to do a pretty exhaustive PCA. Rather than go through how we do it, we’ll go through some of the results. This is a PCA result from Paradise for the first two eigenvectors of our interval Niobrara to Greenhorn. In this case, we selected an inline that was going through one of our key wells. But, you can look at all of the in-lines in your survey, and you can look at suites of in-lines in the survey to see if your PCA, how it changes throughout your survey and through various time intervals.
In this case, we ran it over a fairly narrow window, just our Niobrara to Greenhorn. And from the first eigenvector, you can see, I kind of blew up cause it’s hard to read the chart. But we had three prominent attributes within the first eigenvector. Those are sweetness, envelope and relative acoustic impedance. You can see their relative contributions to the eigenvector. Then it drops off quickly for the other attributes that we put into the PCA. Likewise, in eigenvector two, we only had two prominent attributes and those are thin bed and instantaneous frequency.
These five attributes along with another four from other eigenvectors are going to go into our SOM. But, you don’t want to just do this blindly necessarily. It’s always really important, you should inspect your attributes. You just don’t want to look at the bar graphs and throw them into the SOM. So, we took the additional step of trying to look at the sweetness, for example, at our primary reservoir, the Niobrara B. Likewise, we had the thin bed indicator extracted near the Niobrara B, near the primary reservoir. And you just want to make sure you don’t have any surprises in those attributes. So, ultimately we selected nine attributes from our instantaneous suite, out of six of our eigenvectors and these are listed here. So, that went into the SOM.
Looking at the results. Here we have our original data through one of our key wells. One of the wells that we have the petrophysical evaluation on. And here we have our 8×8 SOM, which after looking, doing a visual kind of QC of the different SOM typologies, this was the one I liked the best. The one I tried to tie back into my wells. The B bench is near the center up here on our amplitude section. It’s this kind of faint peak. But in the SOM, it just jumps right out at you. It’s this yellow, red, yellow sequence. And again, you can see there’s a lot more structure on the chalk than there is on the seismic horizon that you’re picking. So, as far as trying to place Wells, this kind of detailed image of your reservoir will certainly enable you to place your wells and understand how you’re going to stay in zone.
Zooming into the well you can again see we have about five milliseconds of data covering the B, which I’ve blown up here. That’s a total of about 30 feet in depth. And our bright red and orange neurons actually are corresponding to the maximum carbonate content within that B bench, as you can see here. So, the neurons in the middle of the B interval are actually showing you the sweet spot of the reservoir. And here’s a reference for how these logs were calculated, if anybody wants to follow up on that. These guys have done all this from triple combo type wireline logs and come up with lithologies and TOC, et cetera. Here, you can see our clean pay flag as well.
Now this is a cross section through three of our Wells with petrophysics. And there’s a lot more to look at in this cross section, but I’ll show you some of the highlights. We have our B bench here, and you can see the markers. Interestingly, the SOM images were B is through some complicated structure. This area right here is very complex, and it’s hard to pick what’s happening structurally with the Niobrara as you take this down to the Greenhorn. Likewise, you can see some areas where perhaps your reservoir has been faulted out. You get a lot more detail out of the SOM that you can use to also avoid areas, in addition to finding your sweet spots.
The A bench, the base of the A bench is well resolved in the navy blue. And we also have our best source rock is this high TOC zone that’s sitting above the B and below the A bench. And that’s shown in this pink and blue neurons. The base of the high TOC zone is not resolved by the SOM. But, the top of it is. So all in all, we get a much better picture of the stratigraphy from the SOM than we do just from using the amplitude data.
Everything I’ve shown up to now was work that was done in 2018. I revisited this project after Paradise 3.4 got released because it came with the ability now to extract logs from our SOM results at well locations. Here’s a cross section I built in Paradise through our 8×8 SOM and the seven wells that we have, the petrophysical results on. What we have on the left, the track one has got a V-shaped curve and a gamma ray. In track two, I’ve taken the SOM and I’ve also overlaying it with the volume of calcite curve from the petrophysics. And then, the third track is our TOC. The fourth track is effective porosity. And then, the fifth track is resistivity. Because the reservoir is chalk, I think that the volume of calcite is a better indicator of reservoir than be shale. So when we move into the SOM statistics, we’re going to use the volume of calcite as a cutoff.
But another thing to know, we have our Fort Hayes limestone down here near the base of our interval that we studied. And you can see that the SOM neurons could potentially be repeated. So it’s important you can export these logs as last files. We had a summer intern named Yin Kai Wang who built a statistical analysis tool to help us evaluate the extracted logs along with the other well logs in the wells. But it’s important in this case, I only evaluated down to the base of the Niobrara so that we weren’t catching this Fort Hayes limestone, when we’re trying to evaluate the SOM results, statistically.
Now this is from a program that he wrote for us over the summer. And it’s based on a paper that was published by Leal and others. If you want to see more on that too, we have Fabia and Rada has presented in our booth this week. They built something called contingency tables and produced statistical measures of the relationship between the flags that they applied to the logs. In this case, we used effective porosity and volume of calcite too, to flag reservoir versus non reservoir just within that Niobrara interval. So, I like personally just looking at the histograms and looking at these histograms, you can quickly see that we have some neurons in our SOM that are dominated by non reservoir. And, we have some that are clearly reservoir. Then we have some that are mixed.
When you make this contingency table, you get a measure called Chi squared, which shows you whether there is a relationship between your SOM neurons and the flag that you’ve said in this case, non reservoir versus reservoir. And in this case, the Chi squared measure on the 8×8 SOM is saying, yes, there is a relationship between SOM neuron and the presence of reservoir or non reservoir. So that’s going over the entire Niobrara interval. Going back to Paradise, I looked at these in three dimensions and tried to get a feeling for which, whether the neurons vertically and laterally were present throughout the various benches through the various reservoirs.
So, here’s our A bench. Our extracted log tells us these neurons are present in the wells, and it shows us which of these neurons have reservoirs. Some of them are almost exclusively reservoir and some are non reservoir. The Chi squared measure shows us that there is a relationship. And then looking in the 3D viewer, you can turn on just those neurons that you found at the wells and look at how they are stacked both vertically and laterally, 3D volume. You can see these neurons are concentrated near the top of the formation.
I actually think 15 and 16 are part of the shale. But of course, the time to depth conversion can sometimes pick up a stray here or there, a stray hit count. Looking at our B bench again. This is kind of neat to see that your neurons are fairly well clustered in the 2D map. These are all the neurons that are in the wells at the B bench. Again, the Chi square test failed. In this case, it’s not because there isn’t a relationship. This is really because we have very few neurons that are actually sampling non reservoir. The B bench is almost entirely a reservoir. So, in this case, the histogram is probably more helpful than necessarily the Chi squared.
But, if you go to my colleague Ivan Merck in his talk tomorrow, you might be able to find some other measures besides these, to look at the statistical relationships. And then, moving on to the C bench. Again, we see a different cluster or set of neurons that are sampling our C at the Wells. And again, we can see that most of the C bench is reservoir as well.
So, this is really helpful to be able to map out in very fine detail over the three-dimensional volume, the presence of these neurons. That’s one way to calibrate your SOM results to your wells. Another question we look to address this summer with Yin Kai’s work was, I think I mentioned that I just selected based on my visual inspection of the SOM was, the 8×8. But, he’s actually found a way to measure, since we can measure Chi squared, also likelihood ratio, Cramer’s V and Bayes factor, all computed from these contingency tables.
You can look at all of these different SOM typologies and what’s happening at the wells. You can see that the highest Chi squared value is the 8×8. I apologize, this is hard to read. But, we have 4×4 on the left and 12×12 on the right. Each one of these measures seems to suggest that the 8×8 typology is the best, at least over that top to base Niobrara section. And certainly you could look over the whole SOM. You could probably apply some different cutoffs, et cetera. And one thing I’ve decided about this particular SOM after looking at the extracted logs and looking at the log curves we had available is, it’s really discriminating between carbonate and shale. So it’s a lithology discrimination SOM, and we can use the SOM to map out the chalk benches. We could probably also use it to map out the shales. I didn’t really go into detail on the what’s below the Niobrara. But, I’d like to take that on next is looking at how it does with the codell sandstone.
To wrap up, one thing that you do have to be careful of is… And this kind of goes with supervised classifications as well is, we had seven wells in a hundred square miles, and we extracted seismic data at those wells and produced hit counts of what neurons got sampled at that well location. So, on the left, this is from Petrel, is the full histogram for the 3D seismic SOM. This is 7×7, not the 8×8. But, the statistics for the full 3D volume for the Niobrara B bench on the 7×7. And in this case, you see that 45 of 49 neurons in the SOM have data within the seismic volume. But if I take my seven wells, we’ve actually only sampled 19 of those 49 neurons. So, it’s important to look at the statistics of your full survey, in addition to what’s happening at your wells.
And it might be that some of these neurons are not really in the zone of interest. There might be something wrong with your horizon or the way you’ve isolated the zone or something. But, when you go back into Paradise and turn on only the 19 of the 49 neurons that our histogram says are the B bench. And then, you look at that in 3D and you do find holes right smack in the middle of the B bench. It’s definitely worth going back and finding out what those neurons are, how extensive they are in your volume, and maybe looking for additional wells that might sample these neurons. And they may show you something different, if you can find a well that has sampled that interval from the ones that you sampled up to that point.
It’s important to consider the volume as well as the well statistics. So, that’s just a watch out.
Thanks for your time. I’ll wrap it up and then we’ll take questions. In summary, seismic, multi attribute analysis is delivering on the promise of improving interpretations via the integration of our attributes, which respond to subsurface conditions, such as stratigraphy, lithology, faulting, fracturing, fluids, pressure, et cetera. Statistical methods and SOMs enhance the interpretation process and can be easily utilized to augment traditional interpretation and utilizes attribute space to simultaneously classify suites of attributes into sample based, high dimension clusters.
In the DJ basin, we have used SOM to resolve our primary reservoir targets, the Niobrara chalk benches. And these are found within approximately 60 milliseconds of two-way travel time data. We’ve resolved our benches to the level of one to five neurons, which in depth would correspond to anywhere from 7 to 35 feet in thickness. The Paradise visualization tools, including 2D color maps and well log extractions enabled closer ties to our well data, which is bringing a lot more value to the results that we’re producing.
Quick acknowledgements. Really think GPI and Fairfield to let us use this data and such a nice high quality data set. I have to acknowledge Yin Kai for the program he provided that lets us apply the statistics to our extracted logs. Sarah Stanley and Patricia Santogrossi did a lot of the work on this 3D volume as well. And Digital formation for providing the petrophysical analysis.