PixelNet: Towards a General Pixel-Level Architecture
We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation  to high-level semantic segmentation. Convolutional predictors, such as the fullyconvolutional network (FCN), have achieved remarkable success by exploiting the spatial redundancy of neighboring pixels through convolutional processing. Though computationally efficient, we point out that such approaches are not statistically efficient during learning precisely because spatial redundancy limits the information learned from neighboring pixels. We demonstrate that (1) stratified sampling allows us to add diversity during batch updates and (2) sampled multi-scale features allow us to explore more nonlinear predictors (multiple fully-connected layers followed by ReLU) that improve overall accuracy. Finally, our objective is to show how a architecture can get performance better than (or comparable to) the architectures designed for a particular task. Interestingly, our single architecture produces state-of-the-art results for semantic segmentation on PASCAL-Context, surface normal estimation  on NYUDv2 dataset, and edge detection on BSDS without contextual post-processing.