AdaptSegNet
Learning to Adapt Structured Output Space for Semantic Segmentation
Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source ground truth labels to the target domain is of great interest. In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation. Considering semantic segmentations as structured outputs that contain spatial similarities between the source and target domains, we adopt adversarial learning in the output space. To further enhance the adapted model, we construct a multi-level adversarial network to effectively perform output space domain adaptation at different feature levels. Extensive experiments and ablation study are conducted under various domain adaptation settings, including synthetic-to-real and cross-city scenarios. We show that the proposed method performs favorably against the stateof-the-art methods in terms of accuracy and visual quality.
Implementations
Learning to Adapt Structured Output Space for Semantic Segmentation
Pytorch implementation of our method for adapting semantic segmentation from the synthetic dataset (source domain) to the real dataset (target domain). Based on this implementation, our result is ranked 3rd in the VisDA Challenge.
Contact: Yi-Hsuan Tsai (wasidennis at gmail dot com) and Wei-Chih Hung (whung8 at ucmerced dot edu)
import numpy as np
import torch
from torch import nn
from torchvision import models
class Classifier_Module(nn.Module):
def __init__(self, dims_in, dilation_series, padding_series, num_classes):
super(Classifier_Module, self).__init__()
self.conv2d_list = nn.ModuleList()
for dilation, padding in zip(dilation_series, padding_series):
self.conv2d_list.append(nn.Conv2d(dims_in, num_classes, kernel_size=3, stride=1, padding=padding, dilation=dilation, bias = True))
for m in self.conv2d_list:
m.weight.data.normal_(0, 0.01)
def forward(self, x):
out = self.conv2d_list[0](x)
for i in range(len(self.conv2d_list)-1):
out += self.conv2d_list[i+1](x)
return out
class DeeplabVGG(nn.Module):
def __init__(self, num_classes, vgg16_caffe_path=None, pretrained=False):
super(DeeplabVGG, self).__init__()
vgg = models.vgg16()
if pretrained:
vgg.load_state_dict(torch.load(vgg16_caffe_path))
features, classifier = list(vgg.features.children()), list(vgg.classifier.children())
#remove pool4/pool5
features = nn.Sequential(*(features[i] for i in range(23)+range(24,30)))
for i in [23,25,27]:
features[i].dilation = (2,2)
features[i].padding = (2,2)
fc6 = nn.Conv2d(512, 1024, kernel_size=3, padding=4, dilation=4)
fc7 = nn.Conv2d(1024, 1024, kernel_size=3, padding=4, dilation=4)
self.features = nn.Sequential(*([features[i] for i in range(len(features))] + [ fc6, nn.ReLU(inplace=True), fc7, nn.ReLU(inplace=True)]))
self.classifier = Classifier_Module(1024, [6,12,18,24],[6,12,18,24],num_classes)
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
return x
def optim_parameters(self, args):
return self.parameters()
