Re-se-arch – The Laboratory for interpretable Visual Modeling, Computing and Learning (iVMCL)

Sun, Wei; Wu, Tianfu

Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis Journal Article

In: IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2021.

Xing, Xianglei; Wu, Tianfu; Zhu, Song-Chun; Wu, Ying Nian

Towards Interpretable Image Synthesis by Learning Sparsely Connected AND-OR Networks Proceedings Article

In: IEEE Conference on Computer Vision and Pattern Recognition (CVRP), 2020., 2020.

@inproceedings{iGenerativeM,

title = {Towards Interpretable Image Synthesis by Learning Sparsely Connected AND-OR Networks},

author = {Xianglei Xing and Tianfu Wu and Song-Chun Zhu and Ying Nian Wu},

url = {https://arxiv.org/abs/1909.04324},

year  = {2020},

date = {2020-02-23},

booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVRP), 2020.},

journal = {CoRR},

abstract = {This paper proposes interpretable image synthesis by learning hierarchical AND-OR networks of sparsely connected semantically meaningful nodes. The proposed method is based on the compositionality and interpretability of scene-objects-parts-subparts-primitives hierarchy in image representation. A scene has different types (i.e., OR) each of which consists of a number of objects (i.e., AND). This can be recursively formulated across the scene-objects-parts-subparts hierarchy and is terminated at the primitive level (e.g., Gabor wavelets-like basis). To realize this interpretable AND-OR hierarchy in image synthesis, the proposed method consists of two components: (i) Each layer of the hierarchy is represented by an over-completed set of basis functions. The basis functions are instantiated using convolution to be translation covariant. Off-the-shelf convolutional neural architectures are then exploited to implement the hierarchy. (ii) Sparsity-inducing constraints are introduced in end-to-end training, which facilitate a sparsely connected AND-OR network to emerge from initially densely connected convolutional neural networks. A straightforward sparsity-inducing constraint is utilized, that is to only allow the top-k basis functions to be active at each layer (where k is a hyperparameter). The learned basis functions are also capable of image reconstruction to explain away input images. In experiments, the proposed method is tested on five benchmark datasets. The results show that meaningful and interpretable hierarchical representations are learned with better qualities of image synthesis and reconstruction obtained than state-of-the-art baselines.},

howpublished = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Sun, Wei; Wu, Tianfu

Image Synthesis from Reconfigurable Layout and Style Proceedings Article

In: International Conference on Computer Vision (ICCV), 2019.

Abstract | Links | BibTeX

Sun, Wei; Wu, Tianfu

Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation Miscellaneous

arXiv preprint, 2019.

Abstract | Links | BibTeX

Li, Bo; Wu, Tianfu; Zhang, Lun; Chu, Rufeng

Auto-Context RCNN Miscellaneous

arXiv preprint, 2018.

Abstract | Links | BibTeX

@misc{AutoCtxRCNN,

title = {Auto-Context RCNN},

author = {Bo Li and Tianfu Wu and Lun Zhang and Rufeng Chu},

url = {https://arxiv.org/abs/1807.02842},

year  = {2018},

date = {2018-01-01},

journal = {CoRR},

volume = {abs/1807.02842},

abstract = {Region-based convolutional neural networks (R-CNN) have largely dominated object detection. Operators defined on RoIs (Region of Interests) play an important role in R-CNNs such as RoIPooling~citefast_rcnn and RoIAlign~citemask_rcnn. They all only utilize information inside RoIs for RoI prediction, even with their recent deformable extensions~citedeformable_cnn. Although surrounding context is well-known for its importance in object detection, it has yet been integrated in R-CNNs in a flexible and effective way. Inspired by the auto-context work and the multi-class object layout work, this paper presents a generic context-mining RoI operator (i.e., RoICtxMining) seamlessly integrated in R-CNNs, and the resulting object detection system is termed Auto-Context R-CNN which is trained end-to-end. The proposed RoICtxMining operator is a simple yet effective two-layer extension of the RoIPooling or RoIAlign operator. Centered at an object-RoI, it creates a 3×3 layout to mine contextual information adaptively in the 8 surrounding context regions on-the-fly. Within each of the 8 context regions, a context-RoI is mined in term of discriminative power and its RoIPooling / RoIAlign features are concatenated with the object-RoI for final prediction. The proposed Auto-Context R-CNN is robust to occlusion and small objects, and shows promising vulnerability for adversarial attacks without being adversarially-trained. In experiments, it is evaluated using RoIPooling as the backbone and shows competitive results on Pascal VOC, Microsoft COCO, and KITTI datasets (including 6.9% mAP improvements over the R-FCN method on COCO test-dev dataset and the first place on both KITTI pedestrian and cyclist detection as of this submission).},

howpublished = {arXiv preprint},

keywords = {},

pubstate = {published},

tppubtype = {misc}

}

Close