Convolutional neural networks (CNNs) is a type of supervised learning technique that can be directly applied to amplitude data for seismic data classification. The high flexibility in CNN architecture enables researchers to design different models for specific problems. In this study, I introduce an encoder-decoder CNN model for seismic facies classification, which classifies all samples in a seismic line simultaneously and provides superior seismic facies quality comparing to the traditional patch-based CNN methods. I compare the encoder-decoder model with a traditional patch- based model to conclude the usability of both CNN architectures.
With the rapid development in GPU computing and success obtained in computer vision domain, deep learning techniques, represented by convolutional neural networks (CNNs), start to entice seismic interpreters in the application of supervised seismic facies classification. A comprehensive review of deep learning techniques is provided in LeCun et al. (2015). Although still in its infancy, CNN-based seismic classification is successfully applied on both prestack (Araya-Polo et al., 2017) and poststack (Waldeland and Solberg, 2017; Huang et al., 2017; Lewis and Vigh, 2017) data for fault and salt interpretation, identifying different wave characteristics (Serfaty et al., 2017), as well as estimating velocity models (Araya-Polo et al., 2018).
The main advantages of CNN over other supervised classification methods are its spatial awareness and automatic feature extraction. For image classification problems, other than using the intensity values at each pixel individually, CNN analyzes the patterns among pixels in an image, and automatically generates features (in seismic data, attributes) suitable for classification. Because seismic data are 3D tomographic images, we would expect CNN to be naturally adaptable to seismic data classification. However, there are some distinct characteristics in seismic classification that makes it more challenging than other image classification problems. Firstly, classical image classification aims at distinguishing different images, while seismic classification aims at distinguishing different geological objects within the same image. Therefore, from an image processing point of view, instead of classification, seismic classification is indeed a segmentation problem (partitioning an image into blocky pixel shapes with a coarser set of colors). Secondly, training data availability for seismic classification is much sparser comparing to classical
image classification problems, for which massive data are publicly available. Thirdly, in seismic data, all features are represented by different patterns of reflectors, and the boundaries between different features are rarely explicitly defined. In contrast, features in an image from computer artwork or photography are usually well-defined. Finally, because of the uncertainty in seismic data, and the nature of manual interpretation, the training data in seismic classification is always contaminated by noise.
To address the first challenge, until today, most, if not all, published studies on CNN-based seismic facies classification perform classification on small patches of data to infer the class label of the seismic sample at the patch center. In this fashion, seismic facies classification is done by traversing through patches centered at every sample in a seismic volume. An alternative approach, although less discussed, is to use CNN models designed for image segmentation tasks (Long et al., 2015; Badrinarayanan et al., 2017; Chen et al., 2018) to obtain sample-level labels in a 2D profile (e.g. an inline) simultaneously, then traversing through all 2D profiles in a volume.
In this study, I use an encoder-decoder CNN model as an implementation of the aforementioned second approach. I apply both the encoder-decoder model and patch-based model to seismic facies classification using data from the North Sea, with the objective of demonstrating the strengths and weaknesses of the two CNN models. I conclude that the encoder-decoder model provides much better classification quality, whereas the patch-based model is more flexible on training data, possibly making it easier to use in production.
The Two Convolutional Neural Networks (CNN) Models
A basic patch-based model consists of several convolutional layers, pooling (downsampling) layers, and fully-connected layers. For an input image (for seismic data, amplitudes in a small 3D window), a CNN model first automatically extracts several high-level abstractions of the image (similar to seismic attributes) using the convolutional and pooling layers, then classifies the extracted attributes using the fully- connected layers, which are similar to traditional multilayer perceptron networks. The output from the network is a single value representing the facies label of the seismic sample at the center of the input patch. An example of patch-based model architecture is provided in Figure 1a. In this example, the network is employed to classify salt versus non-salt from seismic amplitude in the SEAM synthetic data (Fehler and Larner, 2008). One input instance is a small patch of data bounded by the red box, and the corresponding output is a class label for this whole patch, which is then assigned to the sample at the patch center. The sample marked as the red dot is classified as non-salt.
Figure 1. Sketches for CNN architecture of a) 2D patch-based model and b) encoder-decoder model. In the 2D patch-based model, each input data instance is a small 2D patch of seismic amplitude centered at the sample to be classified. The corresponding output is then a class label for the whole 2D patch (in this case, non-salt), which is usually assigned to the sample at the center. In the encoder-decoder model, each input data instance is a whole inline (or crossline/time slice) of seismic amplitude. The corresponding output is a whole line of class labels, so that each sample is assigned a label (in this case, some samples are salt and others are non-salt). Different types of layers are denoted in different colors, with layer types marked at their first appearance in the network. The size of the cuboids approximately represents the output size of each layer.
Encoder-decoder is a popular network structure for tackling image segmentation tasks. Encoder-decoder models share a similar idea, which is first extracting high level abstractions of input images using convolutional layers, then recovering sample-level class labels by “deconvolution” operations. Chen et al. (2018) introduce a current state-of-the-art encoder-decoder model while concisely reviewed some popular predecessors. An example of encoder-decoder model architecture is provided in Figure 1b. Similar to the patch-based example, this encoder-decoder network is employed to classify salt versus non-salt from seismic amplitude in the SEAM synthetic data. Unlike the patch- based network, in the encoder-decoder network, one input instance is a whole line of seismic amplitude, and the corresponding output is a whole line of class labels, which has the same dimension as the input data. In this case, all samples in the middle of the line are classified as salt (marked in red), and other samples are classified as non-salt (marked in white), with minimum error.
Application of the Two CNN Models
For demonstration purpose, I use the F3 seismic survey acquired in the North Sea, offshore Netherlands, which is freely accessible by the geoscience research community. In this study, I am interested to automatically extract seismic facies that have specific seismic amplitude patterns. To remove the potential disagreement on the geological meaning of the facies to extract, I name the facies purely based on their reflection characteristics. Table 1 provides a list of extracted facies. There are eight seismic facies with distinct amplitude patterns, another facies (“everything else”) is used for samples not belonging to the eight target facies.
Varies amplitude steeply dipping
Low amplitude deformed
Low amplitude dipping
High amplitude deformed
Moderate amplitude continuous
To generate training data for the seismic facies listed above, different picking scenarios are employed to compensate for the different input data format required in the two CNN models (small 3D patches versus whole 2D lines). For the patch-based model, 3D patches of seismic amplitude data are extracted around seed points within some user-defined polygons. There are approximately 400,000 3D patches of size 65×65×65 generated for the patch-based model, which is a reasonable amount for seismic data of this size. Figure 2a shows an example line on which seed point locations are defined in the co-rendered polygons.
The encoder-decoder model requires much more effort for generating labeled data. I manually interpret the target facies on 40 inlines across the seismic survey and use these for building the network. Although the total number of seismic samples in 40 lines are enormous, the encoder-decoder model only considers them as 40 input instances, which in fact are of very small size for a CNN network. Figure 2b shows an interpreted line which is used in training the network
In both tests, I randomly use 90% of the generated training data to train the network and use the remaining 10% for testing. On an Nvidia Quadro M5000 GPU with 8GB memory, the patch-based model takes about 30 minutes to converge, whereas the encoder-decoder model needs about 500 minutes. Besides the faster training, the patch-based model also has a higher test accuracy at almost 100% (99.9988%, to be exact) versus 94.1% from the encoder- decoder model. However, this accuracy measurement is sometimes a bit misleading. For a patch-based model, when picking the training and testing data, interpreters usually pick the most representative samples of each facies for which they have the most confidence, resulting in high quality training (and testing) data that are less noisy, and most of the ambiguous samples which are challenging for the classifier are excluded from testing. In contrast, to use an encoder-decoder model, interpreters have to interpret all the target facies in a training line. For example, if the target is faults, one needs to pick all faults in a training line, otherwise unlabeled faults will be considered as “non-fault” and confuse the classifier. Therefore, interpreters have to make some not-so-confident interpretation when generating training and testing data. Figure 2c and 2d show seismic facies predicted from the two CNN models on the same line shown in Figure 2a and 2b. We observe better defined facies from the encoder-decoder model compared to the patch- based model.
Figure 3 shows prediction results from the two networks on a line away from the training lines, and Figure 4 shows prediction results from the two networks on a crossline. Similar to the prediction results on the training line, comparing to the patch-based model, the encoder-decoder model provides facies as cleaner geobodies that require much less post-editing for regional stratigraphic classification (Figure 5). This can be attributed to an encoder-decoder model that is able to capture the large scale spatial arrangement of facies, whereas the patch-based model only senses patterns in small 3D windows. To form such windows, the patch-based model also needs to pad or simply skip samples close to the edge of a 3D seismic volume. Moreover, although the training is much faster in a patch-based model, the prediction stage is very computationally intensive, because it processes data size N×N×N times of the original seismic volume (N is the patch size along each dimension). In this study, the patch-based method takes about 400 seconds to predict a line, comparing to less than 1 second required in the encoder-decoder model.
In this study, I compared two types of CNN models in the application of seismic facies classification. The more commonly used patch-based model requires much less effort in generating labeled data, but the classification result is suboptimal comparing to the encoder-decoder model, and the prediction stage can be very time consuming. The encoder-decoder model generates superior classification result at near real-time speed, at the expense of more tedious labeled data picking and longer training time.
Figure 2. Example of seismic amplitude co-rendered with training data picked on inline 340 used for a) patch-based model and b) encoder-decoder model. The prediction result from c) patch-based model, and d) from the encoder-decoder model. Target facies are colored in colder to warmer colors in the order shown in Table 1. Compare Facies 5, 6 and 8.
Figure 3. Prediction results from the two networks on a line away from the training lines. a) Predicted facies from the patch-based model. b) Predicted facies from the encoder-decoder based model. Target facies are colored in colder to warmer colors in the order shown in Table 1. The yellow dotted line marks the location of the crossline shown in Figure 4. Compare Facies 1, 5 and 8.
Figure 4. Prediction results from the two networks on a crossline. a) Predicted facies from the patch-based model. b) Predicted facies from the encoder-decoder model. Target facies are colored in colder to warmer colors in the order shown in Table 1. The yellow dotted lines mark the location of the inlines shown in Figure 2 and 3. Compare Facies 5 and 8.
Figure 5. Volumetric display of the predicted facies from the encoder-decoder model. The facies volume is visually cropped for display purpose. An inline and a crossline of seismic amplitude co-rendered with predicted facies are also displayed to show a broader distribution of the facies. Target facies are colored in colder to warmer colors in the order shown in Table 1.