Originally posted on the Orfeo ToolBox Blog
Deep learning has proven to be an extremely powerful tool in many fields, and particularly in image processing: these approaches are currently subject of great interest in the Computer Vision community. However, while a number of typical CV problems can be directly transposed in Remote Sensing (Semantic labeling, Classification problems, …), some intrinsic properties of satellite and aerial imagery makes difficult to process real world RS images (large size, formats, geospatial characteristics, standards, etc.). That’s why pre-processed, pre-annotated public datasets like UC Merced or Postdam are so damn popular. And conversely, there is currently no operational, scalable, and user-friendly tools for people belonging to the RS community who have no coding knowledge.
In this post, we introduce a new remote module, otbtf, enabling the use of deep learning techniques with real world geospatial data. This remote module uses the high performance numerical computation library TensorFlow to bring the deep learning magic into OTB. The C++API of TensorFlow is used to run tensorflow sessions inside filters that are compliant with the streaming mechanism of ITK and OTB, meaning that there is no limitation on the size of images to be processed. Let’s see quickly what’s in the kit!
The first step to apply deep learning techniques to real world datasets, consists in building the dataset. The existing framework of the Orfeo ToolBox is great and offers tools like PolygoncClassStatistics, SampleSelection, SampleExtraction, and more recently SampleAugmentation. Those are really great for pixel wise or object oriented classification/regression tasks. On the deep learning side, among popular deep nets, the Convolutional Neural Network (CNN) has shown great potential for a number of tasks (segmentation, classification, object detection, image recognition, etc.). However, the CNN are trained over patches of images rather than batches of single pixel. Hence the first application of otbtf targets the patches sampling and is called (by the way) PatchesExtraction. It integrates seamlessly in the existing sampling framework of OTB: typically one can use the PolygoncClassStatistics and SampleSelection applications to select patches centers, then give them to the PatchesExtraction application. As we wanted to keep things simple, sampled patches are stored in one single big image that stacks all patches in rows . There is multiple advantages to this. First, accessing one unique big file is more efficient than working on thousands of separate small files stored in the file system. Secondly, one can visualize/process patches as any image: for instance, we can import the big patches image in QGIS and check that our patches looks good. Thirdly, the interleave of the sampled source is preserved, which, guarantee good performance during data access. Besides, we expect that a deep net use the same kind of input for training than for inference, in term of data interleave (it seems rational to consider that the data shouldn’t need to be reordered, for training or inference!).
Training a deep net
Once patches are extracted in images, one can train a deep net, feeding to the application TensorflowModelTrain some patches for training and patches for validation. We think that there will be more and more deep nets available for RS processing. Regarding TensorFlow, a convenient way to serialize models is done through Google Protobuf (called SavedModel). This enables the reuse of a model, for training or inference. The TensorflowModelTrain loads a SavedModel, then train it. As the model variables can be either loaded or saved, one can train the model from scratch or perform some fine tuning, and save variables at different places allowing to process the same model later with different kernel weights, etc (e.g. in classification task, we can train the model from scratch for an entire country, then performing some fine tuning over multiple ecological region).
Currently, we are working on the integration of something to make TensorflowModelTrain reports to TensorBoard. Also, we would add something in a near future to optimize the streaming of batches during training/validation. We will also create soon a repository somewhere to host various SavedModel implemented in research papers.
Serving a deep net
This is where thing become interesting. With the TensorflowModelServe application, we can use any tensorflow model with any number of input sources, any number of input placeholders (that might as well be some user-specific scalar placeholders, for instance “parameter1=0.2”). Thank to the streaming mechanism, we can process any number of pixels in a seamless way. There is two modes to use a deep net: (1) Patch-based: extract and process patches independently at regular intervals. Patches sizes are equal to the receptive field sizes of inputs. For each input, a tensor with a number of elements equal to the number of patches feds the TensorFlow model. (2) Fully-convolutional: Unlike patch-based mode, it allows the processing of an entire requested region. For each input, a tensor composed of one single element, corresponding to the input requested region, is fed to the TensorFlow model. This mode requires that receptive fields, expression fields (the output space that a model will produce for one input patch) and scale factors (the total stride between the input and the output, i.e. the physical spacing change) are consistent with operators implemented in the TensorFlow model, input images physical spacing and alignment. This second mode enables even “old” hardware to process entire scenes in reasonable processing time, enforcing the OTB philosophy 😉 Last but not least, as the entire pipeline implemented in TensorflowModelServe is fully streamable, the application benefits from the Message Passing Interface (MPI) support of OTB, and can be used on any High Performance Computing clusters that share parallel file system.
Using deep features inside OTB machine learning framework
Recent studies have suggested that deep nets features can be used as input features of algorithms like classification, leading state of the art results. Regarding RS image classification, OTB already implement a number of algorithms in its classification application, e.g. SVM, Random Forests, boost classifier, decision tree classifier, gradient boosted tree classifier, normal Bayes classifier. We provide two composite application, TrainClassifierFromDeepFeatures and ImageClassifierFromDeepFeatures, that reuse TensorflowModelServe as input of, respectively the TrainImagesClassifier and ImagesClassifier applications (official OTB applications dedicated to train a classifier from features and perform a pixel wise image classification). We could also do the same thing for SampleExtractionFromDeepFeatures, TrainVectorClassifier to name a few (future work!).
Now, you don’t have any excuse to apply leading, state of the art classification methods 😉
You can read this paper to have more details about concepts involved in otbtf.