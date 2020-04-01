The U-Net architecture augments the fully connected CNN by having a series of contracting layers that capture context followed by a symmetric group of expanding layers that allow the model to learn accurate segmentation boundaries. The contracting layers are called the down-sampler or encoder, and the expanding layers are called the up-sampler or the decoder.

Our initial baseline approach was to try and do both the building localization and damage classification tasks in a single U-Net model. Not too surprisingly, this approach did not perform well due to the model having to express both the coarse-grain task of separating buildings from the background combined with the finer-grain task of rating the building damage. Thus, we decided to split the problem into the two subproblems of localization and damage classification.

Localization

For the localization problem, our goal was to classify each pixel in the pre-disaster image as either “building” or “no building.”

Data pipeline

First, we used the tf.data and the tf.image libraries to create a data pipeline to load each pair of pre-/post-disaster images as a pair of tensors—each with dimension (1024, 1024, 3) that contained the RGB values for each pixel in the image.

Next, in our pipeline, we concatenated the two tensors into one tensor of dimension (1024, 1024, 6), with the idea being that even though the post-disaster image can be dramatically different from the pre-disaster image, there is still information added in deciding what is a building versus not a building. We then applied several data augmentation techniques, such as rotations and reflections, at random, which allowed us to expand the number of training examples we had for each epoch.

Model description

The localization model was a single U-Net model setup to do binary semantic image segmentation. The optimal model had nine down-sampling layers and nine up-sampling layers. The model was trained using cross entropy as the lost function, and training took approximately five days to complete

Damage classification

Data pipeline

Once again we utilized the tf.data and tf.image libraries to create tensors of dimension (1024, 1024, 6) , representing the concatenated pre-/post-disaster images. Then, we used the output of the localization model to mask the tensor so that all non-building pixels had a zero value. Next, the tensors were randomly cropped to be of dimension (256, 256, 6) , and only crops that contained at least 20% of non-zero pixels were used in training.

Model description