PoolNet
One of the top state-of-the-art results for the salient object detection has the PoolNet model, which performs well with different backbone models, such as VGG or ResNet. The key operation of this model is pooling operation, which let's have better representation for both, deep features and shallow features. It has one of the top results on 5 benchmark datasets for Salient Object Detection problems. It has 2 main modules in its network. Global Guidance Module (GGM), and Feature Aggregation Module (FAM). Both help to have better feature representation of salient regions.
Implementations
PoolNet
The PoolNet model has 2 main modules in its network. One is aimed to extract better representations for deep features, the second one is responsible for shallower layers' features, which has a lot of details about salient regions.
Global Guidance Module (GGM)
Regarding the lack of high-level semantic information for fine-level feature maps in the top-down pathway, they introduce a global guidance module which contains a modified version of the pyramid pooling module (PPM) and a series of global guiding flows (GGFs) to explicitly make feature maps at each level be aware of the locations of the salient objects.
Feature Aggregation Module (FAM)
The utilization of our GGM allows global guidance information to be delivered to feature maps at different pyramid levels. However, a new question that deserves asking is how to make the coarse-level feature maps from GGM seamlessly merged with the feature maps at different scales of the pyramid.
PoolNet Architecture
If you take a look at the visualization of the network, you will see, that GGM is located between encoder and decoder, which has a couple of connections with shallow layers of decoder part.
It also has joint training with edge detection, which is the top 3 images of each block. The kind of improving the edges and make them exquisite.
Results
Images, results and some descriptions are taken from the paper of PoolNet
Authors: @Jiang-Jiang Liu, @Qibin Hou, et. al.
Pytorch PoolNet
As we have already said, there are a couple of variations of the network, w.r.t the backbone of ResNet, VGG, and Edge Detection.
Pytorch 0.4.1+
class ConvertLayer(nn.Module):
def __init__(self, list_k):
super(ConvertLayer, self).__init__()
up = []
for i in range(len(list_k[0])):
up.append(nn.Sequential(nn.Conv2d(list_k[0][i], list_k[1][i], 1, 1, bias=False), nn.ReLU(inplace=True)))
self.convert0 = nn.ModuleList(up)
def forward(self, list_x):
resl = []
for i in range(len(list_x)):
resl.append(self.convert0[i](list_x[i]))
return resl
class DeepPoolLayer(nn.Module):
def __init__(self, k, k_out, need_x2, need_fuse):
super(DeepPoolLayer, self).__init__()
self.pools_sizes = [2,4,8]
self.need_x2 = need_x2
self.need_fuse = need_fuse
pools, convs = [],[]
for i in self.pools_sizes:
pools.append(nn.AvgPool2d(kernel_size=i, stride=i))
convs.append(nn.Conv2d(k, k, 3, 1, 1, bias=False))
self.pools = nn.ModuleList(pools)
self.convs = nn.ModuleList(convs)
self.relu = nn.ReLU()
self.conv_sum = nn.Conv2d(k, k_out, 3, 1, 1, bias=False)
if self.need_fuse:
self.conv_sum_c = nn.Conv2d(k_out, k_out, 3, 1, 1, bias=False)
def forward(self, x, x2=None, x3=None):
x_size = x.size()
resl = x
for i in range(len(self.pools_sizes)):
y = self.convs[i](self.pools[i](x))
resl = torch.add(resl, F.interpolate(y, x_size[2:], mode='bilinear', align_corners=True))
resl = self.relu(resl)
if self.need_x2:
resl = F.interpolate(resl, x2.size()[2:], mode='bilinear', align_corners=True)
resl = self.conv_sum(resl)
if self.need_fuse:
resl = self.conv_sum_c(torch.add(torch.add(resl, x2), x3))
return resl
class PoolNet(nn.Module):
def __init__(self, base_model_cfg, base, convert_layers, deep_pool_layers, score_layers):
super(PoolNet, self).__init__()
self.base_model_cfg = base_model_cfg
self.base = base
self.deep_pool = nn.ModuleList(deep_pool_layers)
self.score = score_layers
if self.base_model_cfg == 'resnet':
self.convert = convert_layers
def forward(self, x):
x_size = x.size()
conv2merge, infos = self.base(x)
if self.base_model_cfg == 'resnet':
conv2merge = self.convert(conv2merge)
conv2merge = conv2merge[::-1]
edge_merge = []
merge = self.deep_pool[0](conv2merge[0], conv2merge[1], infos[0])
for k in range(1, len(conv2merge)-1):
merge = self.deep_pool[k](merge, conv2merge[k+1], infos[k])
merge = self.deep_pool[-1](merge)
merge = self.score(merge, x_size)
return merge
The code is taken from a public dataset of @backseason account in GitHub.
Here is the Github repository link
