Re-se-arch
Our re-se-arch has been generously supported by ARO, NSF, ARFL, IARPA, BlueHalo and Salesforce.
2019
Sun, Wei; Wu, Tianfu
Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation Miscellaneous
arXiv preprint, 2019.
@misc{SPAP,
title = {Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation},
author = {Wei Sun and Tianfu Wu},
url = {https://arxiv.org/abs/1901.06322},
year = {2019},
date = {2019-01-01},
journal = {CoRR},
volume = {abs/1901.06322},
abstract = {Image synthesis and image-to-image translation are two important generative learning tasks. Remarkable progress has been made by learning Generative Adversarial Networks (GANs) and cycle-consistent GANs (CycleGANs) respectively. This paper presents a method of learning Spatial Pyramid Attentive Pooling (SPAP) which is a novel architectural unit and can be easily integrated into both generators and discriminators in GANs and CycleGANs. The proposed SPAP integrates Atrous spatial pyramid, a proposed cascade attention mechanism and residual connections. It leverages the advantages of the three components to facilitate effective end-to-end generative learning: (i) the capability of fusing multi-scale information by ASPP; (ii) the capability of capturing relative importance between both spatial locations (especially multi-scale context) or feature channels by attention; (iii) the capability of preserving information and enhancing optimization feasibility by residual connections. Coarse-to-fine and fine-to-coarse SPAP are studied and intriguing attention maps are observed in both tasks. In experiments, the proposed SPAP is tested in GANs on the Celeba-HQ-128 dataset, and tested in CycleGANs on the Image-to-Image translation datasets including the Cityscape dataset, Facade and Aerial Maps dataset, both obtaining better performance.},
howpublished = {arXiv preprint},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Image synthesis and image-to-image translation are two important generative learning tasks. Remarkable progress has been made by learning Generative Adversarial Networks (GANs) and cycle-consistent GANs (CycleGANs) respectively. This paper presents a method of learning Spatial Pyramid Attentive Pooling (SPAP) which is a novel architectural unit and can be easily integrated into both generators and discriminators in GANs and CycleGANs. The proposed SPAP integrates Atrous spatial pyramid, a proposed cascade attention mechanism and residual connections. It leverages the advantages of the three components to facilitate effective end-to-end generative learning: (i) the capability of fusing multi-scale information by ASPP; (ii) the capability of capturing relative importance between both spatial locations (especially multi-scale context) or feature channels by attention; (iii) the capability of preserving information and enhancing optimization feasibility by residual connections. Coarse-to-fine and fine-to-coarse SPAP are studied and intriguing attention maps are observed in both tasks. In experiments, the proposed SPAP is tested in GANs on the Celeba-HQ-128 dataset, and tested in CycleGANs on the Image-to-Image translation datasets including the Cityscape dataset, Facade and Aerial Maps dataset, both obtaining better performance.
2018
Li, Bo; Wu, Tianfu; Zhang, Lun; Chu, Rufeng
Auto-Context RCNN Miscellaneous
arXiv preprint, 2018.
@misc{AutoCtxRCNN,
title = {Auto-Context RCNN},
author = {Bo Li and Tianfu Wu and Lun Zhang and Rufeng Chu},
url = {https://arxiv.org/abs/1807.02842},
year = {2018},
date = {2018-01-01},
journal = {CoRR},
volume = {abs/1807.02842},
abstract = {Region-based convolutional neural networks (R-CNN) have largely dominated object detection. Operators defined on RoIs (Region of Interests) play an important role in R-CNNs such as RoIPooling~citefast_rcnn and RoIAlign~citemask_rcnn. They all only utilize information inside RoIs for RoI prediction, even with their recent deformable extensions~citedeformable_cnn. Although surrounding context is well-known for its importance in object detection, it has yet been integrated in R-CNNs in a flexible and effective way. Inspired by the auto-context work and the multi-class object layout work, this paper presents a generic context-mining RoI operator (i.e., RoICtxMining) seamlessly integrated in R-CNNs, and the resulting object detection system is termed Auto-Context R-CNN which is trained end-to-end. The proposed RoICtxMining operator is a simple yet effective two-layer extension of the RoIPooling or RoIAlign operator. Centered at an object-RoI, it creates a 3×3 layout to mine contextual information adaptively in the 8 surrounding context regions on-the-fly. Within each of the 8 context regions, a context-RoI is mined in term of discriminative power and its RoIPooling / RoIAlign features are concatenated with the object-RoI for final prediction. The proposed Auto-Context R-CNN is robust to occlusion and small objects, and shows promising vulnerability for adversarial attacks without being adversarially-trained. In experiments, it is evaluated using RoIPooling as the backbone and shows competitive results on Pascal VOC, Microsoft COCO, and KITTI datasets (including 6.9% mAP improvements over the R-FCN method on COCO test-dev dataset and the first place on both KITTI pedestrian and cyclist detection as of this submission).},
howpublished = {arXiv preprint},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Region-based convolutional neural networks (R-CNN) have largely dominated object detection. Operators defined on RoIs (Region of Interests) play an important role in R-CNNs such as RoIPooling~citefast_rcnn and RoIAlign~citemask_rcnn. They all only utilize information inside RoIs for RoI prediction, even with their recent deformable extensions~citedeformable_cnn. Although surrounding context is well-known for its importance in object detection, it has yet been integrated in R-CNNs in a flexible and effective way. Inspired by the auto-context work and the multi-class object layout work, this paper presents a generic context-mining RoI operator (i.e., RoICtxMining) seamlessly integrated in R-CNNs, and the resulting object detection system is termed Auto-Context R-CNN which is trained end-to-end. The proposed RoICtxMining operator is a simple yet effective two-layer extension of the RoIPooling or RoIAlign operator. Centered at an object-RoI, it creates a 3×3 layout to mine contextual information adaptively in the 8 surrounding context regions on-the-fly. Within each of the 8 context regions, a context-RoI is mined in term of discriminative power and its RoIPooling / RoIAlign features are concatenated with the object-RoI for final prediction. The proposed Auto-Context R-CNN is robust to occlusion and small objects, and shows promising vulnerability for adversarial attacks without being adversarially-trained. In experiments, it is evaluated using RoIPooling as the backbone and shows competitive results on Pascal VOC, Microsoft COCO, and KITTI datasets (including 6.9% mAP improvements over the R-FCN method on COCO test-dev dataset and the first place on both KITTI pedestrian and cyclist detection as of this submission).