Re-se-arch

Our re-se-arch has been generously supported by ARO, NSF, ARFL, IARPA, BlueHalo and Salesforce.

Our current re-se-arch interests mainly focus on:

(i) Grammar-Guided Interpretable and Robust Representation Learning. This line of research is motivated by “the belief that thinking of all kinds requires grammars” and “Grammar in language is merely a recent extension of much older grammars that are built into the brains of all intelligent animals to analyze sensory input, to structure their actions and even formulate their thoughts.” — Professor David Mumford.

(ii) Deep Consensus Lifelong Learning for Joint Discriminative and Generative Modeling. The world is highly structural with complex compositional regularities. To facilitate developing a unified AI ALTERing (Ask, Learn, Test, Explain and Refine) framework, on top of the research in (i), this line of research is to address one grand challenge in computer vision and machine (deep) learning, that is to model and learn the joint distribution of Grammar-like structures and raw data, p(structures, data), in a principled way. It typically consists of two tasks: structured output prediction that aims to learn p(structures | data) (e.g. image semantic segmentation or image parsing), and structured input synthesis that aims to learn p(data | structures), i.e., controllable and reconfigurable conditional generative learning (e.g., text/layout-to-image synthesis), or AIGC emerged more recently. Deep consensus lifelong learning aims to integrate them in a closed loop for AI ALTERing and AIGCGT (AI Generated Content and Ground-Truth).

2025

Paniagua, Thomas; Savadikar, Chinmay; Wu, Tianfu

Adversarial Perturbations Are Formed by Iteratively Learning Linear Combinations of the Right Singular Vectors of the Adversarial Jacobian Proceedings Article

In: The Forty-second International Conference on Machine Learning, ICML, 2025.

Zheng, Ce; Liu, Xianpeng; Peng, Qucheng; Wu, Tianfu; Wang, Pu; Chen, Chen

DiffMesh: A Motion-Aware Diffusion Framework for Human Mesh Recovery from Videos Proceedings Article

In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025, Tucson, AZ, USA, February 26 - March 6, 2025, pp. 4891–4901, IEEE, 2025.

Savadikar, Chinamay; Song, Xi; Wu, Tianfu

Generative Weight-Aware Fine-Tuning for Multi-faceted Efficiency across Parameters, Representations, Compute and Memory Proceedings Article

In: The Forty-second International Conference on Machine Learning, ICML, 2025.

Reza, Md Farhamdur; Jin, Richeng; Wu, Tianfu; Dai, Huaiyu

GSBA$ˆK$: $top$-$K$ Geometric Score-based Black-box Attack Proceedings Article

In: The Thirteenth International Conference on Learning Representations, 2025.

Ke, Zeran; Tan, Bin; Zheng, Xianwei; Shen, Yujun; Wu, Tianfu; Xue, Nan

ScaleLSD: Scalable Deep Line Segment Detection Streamlined Proceedings Article

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2025.

Xue, Rui; Wu, Tianfu

H$^3$GNNs: Harmonizing Heterophily and Homophily in GNNs via Joint Structural Node Encoding and Self-Supervised Learning Miscellaneous

2025.

2024

Shen, Jianghao; Xue, Nan; Wu, Tianfu

A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction Journal Article

In: CoRR, vol. abs/2405.20310, 2024.

Jin, Richeng; Liu, Yuding; Huang, Yufan; He, Xiaofan; Wu, Tianfu; Dai, Huaiyu

Sign-Based Gradient Descent With Heterogeneous Data: Convergence and Byzantine Resilience Journal Article

In: IEEE Transactions on Neural Networks and Learning Systems, pp. 1-13, 2024.

Dai, Philip; Yue, Kai; Jin, Richeng; Wu, Matt Tianfu; Xiong, Kaiqi

Enhancing Approximate Message Passing via Diffusion Models Towards On-Device Intelligence Proceedings Article

In: IEEE International Conference on Communications Workshops, ICC 2024 Workshops, Denver, CO, USA, June 9-13, 2024, pp. 890–895, IEEE, 2024.

Liu, Xianpeng; Zheng, Ce; Qian, Ming; Xue, Nan; Chen, Chen; Zhang, Zhebin; Li, Chen; Wu, Tianfu

Multi-View Attentive Contextualization for Multi-View 3D Object Detection Proceedings

In: CVPR'24, 2024.

Abstract | BibTeX

Xue, Nan; Tan, Bin; Xiao, Yuxi; Dong, Liang; Xia, Gui-Song; Wu, Tianfu; Shen, Yujun

NEAT: Distilling 3D Wireframes from Neural Attraction Fields Proceedings

In: CVPR'24, 2024.

Abstract | BibTeX

2023

Xue, Nan; Wu, Tianfu; Bai, Song; Wang, Fu-Dong; Xia, Gui-Song; Zhang, Liangpei; Torr, Philip H. S.

Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised Learning Journal Article

In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 12, pp. 14727-14744, 2023.

Abstract | Links | BibTeX

Tan, Bin; Xue, Nan; Wu, Tianfu; Xia, Gui-Song

NOPE-SAC: Neural One-Plane RANSAC for Sparse-View Planar 3D Reconstruction Journal Article

In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 12, pp. 5233-15248, 2023.

Abstract | Links | BibTeX

Jiang, Bo; Krim, Hamid; Wu, Tianfu; Cansever, Derya

Implicit Bayes Adaptation: A Collaborative Transport Approach Proceedings Article

In: IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023, pp. 1–5, 2023.

Xiao, Yuxi; Xue, Nan; Wu, Tianfu; Xia, Gui-Song

Level-S$^2$fM: Structure from Motion on Neural Level Set of Implicit Surfaces Proceedings Article

In: CVPR, 2023.

Abstract | Links | BibTeX

Grainger, Ryan; Paniagua, Thomas; Song, Xi; Cuntoor, Naresh; Lee, Mun Wai; Wu, Tianfu

PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers Proceedings Article

In: CVPR, 2023.

Abstract | Links | BibTeX

Kashyap, Priyank; Ravichandiran, Prasanth Prabu; Wang, Lee; Baron, Dror; Wong, Chau-Wai; Wu, Tianfu; Franzon, Paul D.

Thermal Estimation for 3D-ICs Through Generative Networks Proceedings Article

In: IEEE International 3D Systems Integration Conference, 3DIC 2023, Cork, Ireland, May 10-12, 2023, pp. 1–4, 2023.

Savadikar, Chinmay; Dai, Michelle; Wu, Tianfu

Learning to Grow Artificial Hippocampi in Vision Transformers for Resilient Lifelong Learning Online

2023, visited: 14.03.2023.

Abstract | Links | BibTeX

@online{artihippo,

title = {Learning to Grow Artificial Hippocampi in Vision Transformers for Resilient Lifelong Learning},

author = {Chinmay Savadikar and Michelle Dai and Tianfu Wu},

url = {https://arxiv.org/pdf/2303.08250.pdf},

year  = {2023},

date = {2023-03-14},

urldate = {2023-03-14},

abstract = {Lifelong learning without catastrophic forgetting (i.e., resiliency) possessed by human intelligence is entangled with sophisticated memory mechanisms in the brain, especially the long-term memory (LM) maintained by Hippocampi. To a certain extent, Transformers have emerged as the counterpart “Brain” of Artificial Intelligence (AI), and yet leave the LM component under-explored for lifelong learning settings. This paper presents a method of learning to grow Artificial Hippocampi

(ArtiHippo) in Vision Transformers (ViTs) for resilient lifelong learning. With a comprehensive ablation study, the final linear projection layer in the multi-head self-attention (MHSA) block is selected in realizing and growing ArtiHippo. ArtiHippo is represented by a mixture of experts (MoEs). Each expert component is an on-site variant of the linear projection layer, which is maintained via neural architecture search (NAS) with the search space defined by four basic growing operations \textendash skip, reuse, adapt, and new in lifelong learning. The LM of a task consists of two parts: the dedicated expert components (as model parameters) at different layers of a ViT learned via NAS, and the mean class-tokens (as stored latent vectors for measuring task similarity) associated with the expert components. For a new task, a hierarchical task-similarity-oriented exploration-exploitation sampling based NAS is proposed to learn the expert components. The task similarity is measured based on the normalized cosine similarity between the mean class-token of the new task and those of old tasks. The proposed method is complementary to prompt-based lifelong learning with ViTs. In experiments, the proposed method is tested on the challenging Visual Domain Decathlon (VDD) benchmark and the recently proposed 5-Dataset benchmark. It obtains consistently better performance than the prior art with sensible ArtiHippo learned continually},

howpublished = {arXiv preprint},

keywords = {},

pubstate = {published},

tppubtype = {online}

}

Xue, Nan; Tan, Bin; Xiao, Yuxi; Dong, Liang; Xia, Gui-Song; Wu, Tianfu

Volumetric Wireframe Parsing from Neural Attraction Fields Online

2023, visited: 21.07.2023.

Abstract | Links | BibTeX

Reza, Md Farhamdur; Rahmati, Ali; Wu, Tianfu; Dai, Huaiyu

CGBA: Curvature-aware Geometric Black-box Attack Proceedings

in: ICCV'23, 2023.

Abstract | Links | BibTeX

Liu, Xianpeng; Zheng, Ce; Cheng, Kelvin; Xue, Nan; Qi, Guo-Jun; Wu, Tianfu

Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver Proceedings

in: ICCV'23, 2023.

Abstract | Links | BibTeX

Paniagua, Thomas; Grainger, Ryan; Wu, Tianfu

QuadAttacK: A Quadratic Programming Approach to Learning Ordered Top-K Adversarial Attacks Proceedings

In: NeurIPS'23, 2023.

Abstract | Links | BibTeX

@proceedings{quadattack,

title = {QuadAttacK: A Quadratic Programming Approach to Learning Ordered Top-K Adversarial Attacks},

author = {Thomas Paniagua and Ryan Grainger and Tianfu Wu},

url = {https://arxiv.org/abs/2312.11510},

year  = {2023},

date = {2023-12-19},

urldate = {2023-12-19},

abstract = {The adversarial vulnerability of Deep Neural Networks (DNNs) has been well-known and widely concerned, often under the context of learning top-$1$ attacks (e.g., fooling a DNN to classify a cat image as dog). This paper shows that the concern is much more serious by learning significantly more aggressive ordered top-$K$ clear-box~footnote{ This is often referred to as white/black-box attacks in the literature. We choose to adopt neutral terminology, clear/opaque-box attacks in this paper, and omit the prefix clear-box for simplicity.} targeted attacks proposed in~citep{zhang2020learning}. We propose a novel and rigorous quadratic programming (QP) method of learning ordered top-$K$ attacks with low computing cost, dubbed as textbf{QuadAttac$K$}. Our QuadAttac$K$ directly solves the QP to satisfy the attack constraint in the feature embedding space (i.e., the input space to the final linear classifier), which thus exploits the semantics of the feature embedding space (i.e., the principle of class coherence). With the optimized feature embedding  vector perturbation, it then computes the adversarial perturbation in the data space via the vanilla one-step back-propagation. In experiments, the proposed QuadAttac$K$ is tested in the ImageNet-1k  classification using ResNet-50, DenseNet-121, and Vision Transformers (ViT-B and DEiT-S). It successfully pushes the boundary of successful ordered top-$K$ attacks from $K=10$ up to $K=20$ at a cheap budget ($1times 60$) and further improves attack success rates for $K=5$ for all tested models, while retaining the performance for $K=1$.},

howpublished = {In: NeurIPS'23},

keywords = {},

pubstate = {published},

tppubtype = {proceedings}

}

2022

Foster, Marc; Wu, Tianfu; Roberts, David L.; Bozkurt, Alper

Preliminary Evaluation of a System with On-Body and Aerial Sensors for Monitoring Working Dogs Journal Article

In: Sensors, vol. 22, no. 19, pp. 7631, 2022.

Liu, Xianpeng; Xue, Nan; Wu, Tianfu

Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection Proceedings Article

In: AAAI 2022, 2022.

Abstract | Links | BibTeX

@inproceedings{MonoCon,

title = {Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection},

author = {Xianpeng Liu and Nan Xue and Tianfu Wu},

url = {https://arxiv.org/abs/2112.04628},

year  = {2022},

date = {2022-02-22},

urldate = {2022-02-22},

booktitle = {AAAI 2022},

abstract = {Monocular 3D object detection aims to localize 3D bounding boxes in an input single 2D image. It is a highly challenging problem and remains open, especially when no extra information (e.g., depth, lidar and/or multi-frames) can be leveraged in training and/or inference. This paper proposes a simple yet effective formulation for monocular 3D object detection without exploiting any extra information. It presents the MonoCon method which learns Monocular Contexts, as auxiliary tasks in training, to help monocular 3D object detection. The key idea is that with the annotated 3D bounding boxes of objects in an image, there is a rich set of well-posed projected 2D supervision signals available in training, such as the projected corner keypoints and their associated offset vectors with respect to the center of 2D bounding box, which should be exploited as auxiliary tasks in training. The proposed MonoCon is motivated by the Cramer-Wold theorem in measure theory at a high level. In implementation, it utilizes a very simple end-to-end design to justify the effectiveness of learning auxiliary monocular contexts, which consists of three components: a Deep Neural Network (DNN) based feature backbone, a number of regression head branches for learning the essential parameters used in the 3D bounding box prediction, and a number of regression head branches for learning auxiliary contexts. After training, the auxiliary context regression branches are discarded for better inference efficiency. In experiments, the proposed MonoCon is tested in the KITTI benchmark (car, pedestrain and cyclist). It outperforms all prior arts in the leaderboard on car category and obtains comparable performance on pedestrian and cyclist in terms of accuracy. Thanks to the simple design, the proposed MonoCon method obtains the fastest inference speed with 38.7 fps in comparisons},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Xue, Nan; Wu, Tianfu; Xia, Gui-Song; Zhang, Liangpei

Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation Proceedings Article

In: CVPR 2022, 2022.

Abstract | Links | BibTeX

Jiang, Bo; Krim, Hamid; Wu, Tianfu; Cansever, Derya

Refining Self-Supervised Learning in Imaging: Beyond Linear Metric Proceedings Article

In: 2022 IEEE International Conference on Image Processing, ICIP 2022, Bordeaux, France, 16-19 October 2022, pp. 76–80, 2022.

Cheng, Kelvin; Wu, Tianfu; Healey, Christopher

Revisiting Non-Parametric Matching Cost Volumes for Robust and Generalizable Stereo Matching Proceedings Article

In: NeurIPS 2022, 2022.

Abstract | Links | BibTeX

@inproceedings{Cheng2022,

title = {Revisiting Non-Parametric Matching Cost Volumes for  Robust and Generalizable Stereo Matching},

author = {Kelvin Cheng and Tianfu Wu and Christopher Healey},

url = {https://openreview.net/forum?id=WXdSp8k0TMn},

year  = {2022},

date = {2022-11-30},

urldate = {2022-11-30},

booktitle = {NeurIPS 2022},

abstract = {Stereo matching is a classic challenging problem in computer vision, which has recently witnessed remarkable progress by Deep Neural Networks (DNNs). This paradigm shift leads to two interesting and entangled questions that have not been addressed well. First, it is unclear whether stereo matching DNNs that are trained from scratch really learn to perform matching well. This paper studies this problem from the lens of white-box adversarial attacks. It presents a method of learning stereo-constrained photometrically-consistent attacks, which by design are weaker adversarial attacks, and yet can cause catastrophic performance drop for those DNNs. This observation suggests that they may not actually learn to perform matching well in the sense that they should otherwise achieve potentially even better after stereo-constrained perturbations are introduced. Second, stereo matching DNNs are typically trained under the simulation-to-real (Sim2Real) pipeline due to the data hungriness of DNNs. Thus, alleviating the impacts of the Sim2Real photometric gap in stereo matching DNNs becomes a pressing need.  Towards joint adversarially robust and domain generalizable stereo matching,  this paper proposes to learn DNN-contextualized binary-pattern-driven non-parametric cost-volumes. It leverages the perspective of learning the cost aggregation via DNNs, and presents a simple yet expressive design that is fully end-to-end trainable, without resorting to specific aggregation inductive biases. In experiments, the proposed method is tested in the SceneFlow dataset, the KITTI2015 dataset, and the Middlebury dataset. It significantly improves the adversarial robustness, while retaining accuracy performance comparable to state-of-the-art methods. It also shows a better Sim2Real generalizability. Our code and pretrained models are released athttps://github.com/kelkelcheng/AdversariallyRobustStereo},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Kashyap, Priyank; Gajjar, Archit; Choi, Yongjin; Wong, Chau-Wai; Baron, Dror; Wu, Tianfu; Cheng, Chris; Franzon, Paul D.

RxGAN: Modeling High-Speed Receiver through Generative Adversarial Networks Proceedings Article

In: 2022 ACM/IEEE Workshop on Machine Learning for CAD, MLCAD 2022, Virtual Event, China, September 12-13, 2022, pp. 167–172, 2022.

Shen, Jianghao; Wu, Tianfu

Learning Inception Attention for Image Synthesis and Image Recognition Online

2022, visited: 01.02.2022.

Abstract | Links | BibTeX

Grainger, Ryan; Paniagua, Thomas; Song, Xi; Wu, Tianfu

Learning Patch-to-Cluster Attention in Vision Transformer Working paper

arXiv preprint, 2022.

Abstract | Links | BibTeX

2021

Li, Meihui; Peng, Lingbing; Wu, Tianfu; Peng, Zhenming

A Bottom-Up and Top-Down Integration Framework for Online Object Tracking Journal Article

In: IEEE Trans. Multim., vol. 23, pp. 105–119, 2021.

Roheda, Siddharth; Krim, Hamid; Luo, Zhi-Quan; Wu, Tianfu

Event driven sensor fusion Journal Article

In: Signal Process., vol. 188, pp. 108241, 2021.

Sun, Wei; Wu, Tianfu

Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis Journal Article

In: IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2021.

Abstract | Links | BibTeX

Tan, Bin; Xue, Nan; Bai, Song; Wu, Tianfu; Xia, Gui-Song

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery Proceedings Article

In: ICCV, 2021.

Abstract | Links | BibTeX

Sun, Wei; Wu, Tianfu

Deep Consensus Learning Online

arXiv preprint 2021.

Abstract | Links | BibTeX

Xue, Nan; Wu, Tianfu; Zhang, Zhen; Xia, Gui-Song

Learning Local-Global Contextual Adaptation for Fully End-to-End Bottom-Up Human Pose Estimation Online

2021, visited: 08.09.2021.

Abstract | Links | BibTeX

@online{LOGOCAP,

title = {Learning Local-Global Contextual Adaptation for Fully End-to-End Bottom-Up Human Pose Estimation},

author = {Nan Xue and Tianfu Wu and Zhen Zhang and Gui-Song Xia},

url = {https://arxiv.org/abs/2109.03622},

year  = {2021},

date = {2021-09-08},

urldate = {2021-09-08},

abstract = {This paper presents a method of learning Local-GlObal Contextual Adaptation for fully end-to-end and fast bottom-up human Pose estimation, dubbed as LOGO-CAP. It is built on the conceptually simple center-offset formulation that lacks inaccuracy for pose estimation. When revisiting the bottom-up human pose estimation with the thought of "thinking, fast and slow" by D. Kahneman, we introduce a "slow keypointer" to remedy the lack of sufficient accuracy of the "fast keypointer". In learning the "slow keypointer", the proposed LOGO-CAP lifts the initial "fast" keypoints by offset predictions to keypoint expansion maps (KEMs) to counter their uncertainty in two modules. Firstly, the local KEMs (e.g., 11x11) are extracted from a low-dimensional feature map. A proposed convolutional message passing module learns to "re-focus" the local KEMs to the keypoint attraction maps (KAMs) by accounting for the structured output prediction nature of human pose estimation, which is directly supervised by the object keypoint similarity (OKS) loss in training. Secondly, the global KEMs are extracted, with a sufficiently large region-of-interest (e.g., 97x97), from the keypoint heatmaps that are computed by a direct map-to-map regression. Then, a local-global contextual adaptation module is proposed to convolve the global KEMs using the learned KAMs as the kernels. This convolution can be understood as the learnable offsets guided deformable and dynamic convolution in a pose-sensitive way. The proposed method is end-to-end trainable with near real-time inference speed, obtaining state-of-the-art performance on the COCO keypoint benchmark for bottom-up human pose estimation. With the COCO trained model, our LOGO-CAP also outperforms prior arts by a large margin on the challenging OCHuman dataset.},

keywords = {},

pubstate = {published},

tppubtype = {online}

}

Cheng, Kelvin; Healey, Christopher; Wu, Tianfu

Towards Adversarially Robust and Domain Generalizable Stereo Matching by Rethinking DNN Feature Backbones Online

arXiv preprint 2021.

Abstract | Links | BibTeX

2020

Xue, Nan; Bai, Song; Wang, Fu-Dong; Xia, Gui-Song; Wu, Tianfu; Zhang, Liangpei; Torr, Philip H. S.

Learning Regional Attraction for Line Segment Detection Journal Article

In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020, ISSN: 0162-8828.

Abstract | Links | BibTeX

Li, Xilai; Sun, Wei; Wu, Tianfu

Attentive Normalization Proceedings Article

In: European Conference on Computer Vision (ECCV), 2020.

Abstract | Links | BibTeX

Xue, Nan; Wu, Tianfu; Bai, Song; Wang, Fudong; Xia, Gui-Song; Zhang, Liangpei; Torr, Philip H. S.

Holistically-Attracted Wireframe Parsing Proceedings Article

In: IEEE Conference on Computer Vision and Pattern Recognition (CVRP), 2020., 2020.

Abstract | BibTeX

Chen, Zexi; Dutton, Benjamin; Ramachandra, Bharathkumar; Wu, Tianfu; Vatsavai, Ranga Raju

Local Clustering with Mean Teacher for Semi-supervised learning Proceedings Article

In: 25th International Conference on Pattern Recognition, ICPR 2020, Virtual Event / Milan, Italy, January 10-15, 2021, pp. 6243–6250, 2020.

Xing, Xianglei; Wu, Tianfu; Zhu, Song-Chun; Wu, Ying Nian

Towards Interpretable Image Synthesis by Learning Sparsely Connected AND-OR Networks Proceedings Article

In: IEEE Conference on Computer Vision and Pattern Recognition (CVRP), 2020., 2020.

Abstract | Links | BibTeX

@inproceedings{iGenerativeM,

title = {Towards Interpretable Image Synthesis by Learning Sparsely Connected AND-OR Networks},

author = {Xianglei Xing and Tianfu Wu and Song-Chun Zhu and Ying Nian Wu},

url = {https://arxiv.org/abs/1909.04324},

year  = {2020},

date = {2020-02-23},

booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVRP), 2020.},

journal = {CoRR},

abstract = {This paper proposes interpretable image synthesis by learning hierarchical AND-OR networks of sparsely connected semantically meaningful nodes. The proposed method is based on the compositionality and interpretability of scene-objects-parts-subparts-primitives hierarchy in image representation. A scene has different types (i.e., OR) each of which consists of a number of objects (i.e., AND). This can be recursively formulated across the scene-objects-parts-subparts hierarchy and is terminated at the primitive level (e.g., Gabor wavelets-like basis). To realize this interpretable AND-OR hierarchy in image synthesis, the proposed method consists of two components: (i) Each layer of the hierarchy is represented by an over-completed set of basis functions. The basis functions are instantiated using convolution to be translation covariant. Off-the-shelf convolutional neural architectures are then exploited to implement the hierarchy. (ii) Sparsity-inducing constraints are introduced in end-to-end training, which facilitate a sparsely connected AND-OR network to emerge from initially densely connected convolutional neural networks. A straightforward sparsity-inducing constraint is utilized, that is to only allow the top-k basis functions to be active at each layer (where k is a hyperparameter). The learned basis functions are also capable of image reconstruction to explain away input images. In experiments, the proposed method is tested on five benchmark datasets. The results show that meaningful and interpretable hierarchical representations are learned with better qualities of image synthesis and reconstruction obtained than state-of-the-art baselines.},

howpublished = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Sun, Wei; Wu, Tianfu

Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis Online

2020.

Abstract | Links | BibTeX

Zhang, Zekun; Wu, Tianfu

Learning Ordered Top-k Attacks via Adversarial Distillation Workshop

CVPRW 2020 Adversarial Machine Learning in Computer Vision, vol. abs/1905.10695, 2020.

Abstract | Links | BibTeX

@workshop{AdvDistillation,

title = {Learning Ordered Top-k Attacks via Adversarial Distillation},

author = {Zekun Zhang and Tianfu Wu},

url = {https://openaccess.thecvf.com/content_CVPRW_2020/papers/w47/Zhang_Learning_Ordered_Top-k_Adversarial_Attacks_via_Adversarial_Distillation_CVPRW_2020_paper.pdf},

year  = {2020},

date = {2020-06-14},

booktitle = {CVPRW 2020 Adversarial Machine Learning in Computer Vision},

journal = {CoRR},

volume = {abs/1905.10695},

abstract = {Deep Neural Networks (DNNs) are vulnerable to adversarial attacks, especially white-box targeted attacks. One scheme of learning attacks is to design a proper adversarial objective function that leads to the imperceptible perturbation for any test image (e.g., the Carlini-Wagner (C\&W) method). Most methods address targeted attacks in the Top-1 manner. In this paper, we propose to learn ordered Top-k attacks (k\>= 1) for image classification tasks, that is to enforce the Top-k predicted labels of an adversarial example to be the k (randomly) selected and ordered labels (the ground-truth label is exclusive). To this end, we present an adversarial distillation framework: First, we compute an adversarial probability distribution for any given ordered Top-k targeted labels with respect to the ground-truth of a test image. Then, we learn adversarial examples by minimizing the Kullback-Leibler (KL) divergence together with the perturbation energy penalty, similar in spirit to the network distillation method. We explore how to leverage label semantic similarities in computing the targeted distributions, leading to knowledge-oriented attacks. In experiments, we thoroughly test Top-1 and Top-5 attacks in the ImageNet-1000 validation dataset using two popular DNNs trained with clean ImageNet-1000 train dataset, ResNet-50 and DenseNet-121. For both models, our proposed adversarial distillation approach outperforms the C\&W method in the Top-1 setting, as well as other baseline methods. Our approach shows significant improvement in the Top-5 setting against a strong modified C\&W method.},

howpublished = {CVPRW20 Adversarial Machine Learning in Computer Vision},

keywords = {},

pubstate = {published},

tppubtype = {workshop}

}

2019

Xie, Zhao; Wu, Tianfu; Yang, Xingming; Zhang, Luming; Wu, Kewei

Jointly social grouping and identification in visual dynamics with causality-induced hierarchical Bayesian model Journal Article

In: J. Visual Communication and Image Representation, vol. 59, pp. 62–75, 2019.

Abstract | Links | BibTeX

Asadi, Khashayar; Chen, Pengyu; Han, Kevin K; Wu, Tianfu; Lobaton, Edgar J

LNSNet: Lightweight Navigable Space Segmentation for Autonomous Robots on Construction Sites Journal Article

In: Data, vol. 4, no. 1, pp. 40, 2019.

Abstract | Links | BibTeX

Li, Xilai; Song, Xi; Wu, Tianfu

AOGNets: Compositional Grammatical Architectures for Deep Learning Proceedings Article

In: IEEE Conference on Computer Vision and Pattern Recognition (CVRP), 2019.

Abstract | Links | BibTeX

@inproceedings{AOGNets,

title = {AOGNets: Compositional Grammatical Architectures for Deep Learning},

author = {Xilai Li and Xi Song and Tianfu Wu},

url = {http://openaccess.thecvf.com/content_CVPR_2019/papers/Li_AOGNets_Compositional_Grammatical_Architectures_for_Deep_Learning_CVPR_2019_paper.pdf 

https://github.com/iVMCL/AOGNets

https://www.wraltechwire.com/2019/05/21/ncsu-researchers-create-framework-for-a-smarter-ai-are-seeking-patent/

https://www.technologynetworks.com/tn/news/new-framework-enhances-neural-network-performance-319704

},

year  = {2019},

date = {2019-06-18},

booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVRP)},

abstract = {Neural architectures are the foundation for improving performance of deep neural networks (DNNs). This paper presents deep compositional grammatical architectures which harness the best of two worlds: grammar models and DNNs. The proposed architectures integrate compositionality and reconfigurability of the former and the capability of learning rich features of the latter in a principled way. We utilize AND-OR Grammar (AOG) as network generator in this paper and call the resulting networks AOGNets. An AOGNet consists of a number of stages each of which is composed of a number of AOG building blocks. An AOG building block splits its input feature map into N groups along feature channels and then treat it as a sentence of N words. It then jointly realizes a phrase structure grammar and a dependency grammar in bottom-up parsing the “sentence” for better feature exploration and reuse. It provides a unified framework for the best practices developed in state-of-the-art DNNs. In experiments, AOGNet is tested in the ImageNet-1K classification benchmark and the MS-COCO object detection and segmentation benchmark. In ImageNet-1K, AOGNet obtains better performance than ResNet and most of its variants, ResNeXt and its attention based variants such as SENet, DenseNet and DualPathNet. AOGNet also obtains the best model interpretability score using network dissection. AOGNet further shows better potential in adversarial defense. In MS-COCO, AOGNet obtains better performance than the ResNet and ResNeXt backbones in Mask R-CNN.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Sun, Wei; Wu, Tianfu

Image Synthesis from Reconfigurable Layout and Style Proceedings Article

In: International Conference on Computer Vision (ICCV), 2019.

Abstract | Links | BibTeX

Li, Xilai; Zhou, Yingbo; Wu, Tianfu; Socher, Richard; Xiong, Caiming

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting Proceedings Article

In: International Conference on Machine Learning (ICML), 2019.

Abstract | Links | BibTeX

Xue, Nan; Bai, Song; Wang, Fudong; Xia, Gui-Song; Wu, Tianfu; Zhang, Liangpei

Learning Attraction Field Representation for Robust Line Segment Detection Proceedings Article

In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Abstract | Links | BibTeX

@inproceedings{AFM_LSD,

title = {Learning Attraction Field Representation for Robust Line Segment Detection},

author = {Nan Xue and Song Bai and Fudong Wang and Gui-Song Xia and Tianfu Wu and Liangpei Zhang},

url = {https://arxiv.org/abs/1812.02122 

https://github.com/cherubicXN/afm_cvpr2019},

year  = {2019},

date = {2019-06-18},

booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},

abstract = {This paper presents a region-partition based attraction field dual representation for line segment maps, and thus poses the problem of line segment detection (LSD) as the region coloring problem. The latter is then addressed by learning deep convolutional neural networks (ConvNets) for accuracy, robustness and efficiency. For a 2D line segment map, our dual representation consists of three components: (i) A region-partition map in which every pixel is assigned to one and only one line segment; (ii) An attraction field map in which every pixel in a partition region is encoded by its 2D projection vector w.r.t. the associated line segment; and (iii) A squeeze module which squashes the attraction field to a line segment map that almost perfectly recovers the input one. By leveraging the duality, we learn ConvNets to compute the attraction field maps for raw in-put images, followed by the squeeze module for LSD, in an end-to-end manner. Our method rigorously addresses several challenges in LSD such as local ambiguity and class imbalance. Our method also harnesses the best practices developed in ConvNets based semantic segmentation methods such as the encoder-decoder architecture and the a-trous convolution. In experiments, our method is tested on the WireFrame dataset and the YorkUrban dataset with state-of-the-art performance obtained. Especially, we advance the performance by 4.5 percents on the WireFrame dataset. Our method is also fast with 6.6~10.4 FPS, outperforming most of existing line segment detectors.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Wu, Tianfu; Song, Xi

Towards Interpretable Object Detection by Unfolding Latent Structures Proceedings Article

In: International Conference on Computer Vision (ICCV), 2019.

Abstract | BibTeX

Sun, Wei; Wu, Tianfu

Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation Miscellaneous

arXiv preprint, 2019.

Abstract | Links | BibTeX

Sun, Wei; Bappy, Jawadul H; Yang, Shanglin; Xu, Yi; Wu, Tianfu; Zhou, Hui

Pose Guided Fashion Image Synthesis Using Deep Generative Model Workshop

2019.

Abstract | Links | BibTeX

2018

Roheda, Siddharth; Krim, Hamid; Luo, Zhi-Quan; Wu, Tianfu

Decision Level Fusion: An Event Driven Approach Proceedings Article

In: 26th European Signal Processing Conference, EUSIPCO 2018, Roma, Italy, September 3-7, 2018, pp. 2598–2602, 2018.

Qi, Hang; Xu, Yuanlu; Yuan, Tao; Wu, Tianfu; Zhu, Song-Chun

Joint Parsing of Cross-view Scenes with Spatio-temporal Semantic Parse Graphs Proceedings Article

In: Proceedings of The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), New Orleans, Lousiana, USA., February 2–7, pp. 1–4, 2018.

Abstract | Links | BibTeX

Li, Bo; Xiong, Caiming; Wu, Tianfu; Zhou, Yu; Zhang, Lun; Chu, Rufeng

Neural Abstract Style Transfer for Chinese Traditional Painting Proceedings Article

In: Asian Conference on Computer Vision (ACCV), 2018.

Abstract | Links | BibTeX

Li, Bo; Wu, Tianfu; Zhang, Lun; Chu, Rufeng

Auto-Context RCNN Miscellaneous

arXiv preprint, 2018.

Abstract | Links | BibTeX

@misc{AutoCtxRCNN,

title = {Auto-Context RCNN},

author = {Bo Li and Tianfu Wu and Lun Zhang and Rufeng Chu},

url = {https://arxiv.org/abs/1807.02842},

year  = {2018},

date = {2018-01-01},

journal = {CoRR},

volume = {abs/1807.02842},

abstract = {Region-based convolutional neural networks (R-CNN) have largely dominated object detection. Operators defined on RoIs (Region of Interests) play an important role in R-CNNs such as RoIPooling~citefast_rcnn and RoIAlign~citemask_rcnn. They all only utilize information inside RoIs for RoI prediction, even with their recent deformable extensions~citedeformable_cnn. Although surrounding context is well-known for its importance in object detection, it has yet been integrated in R-CNNs in a flexible and effective way. Inspired by the auto-context work and the multi-class object layout work, this paper presents a generic context-mining RoI operator (i.e., RoICtxMining) seamlessly integrated in R-CNNs, and the resulting object detection system is termed Auto-Context R-CNN which is trained end-to-end. The proposed RoICtxMining operator is a simple yet effective two-layer extension of the RoIPooling or RoIAlign operator. Centered at an object-RoI, it creates a 3×3 layout to mine contextual information adaptively in the 8 surrounding context regions on-the-fly. Within each of the 8 context regions, a context-RoI is mined in term of discriminative power and its RoIPooling / RoIAlign features are concatenated with the object-RoI for final prediction. The proposed Auto-Context R-CNN is robust to occlusion and small objects, and shows promising vulnerability for adversarial attacks without being adversarially-trained. In experiments, it is evaluated using RoIPooling as the backbone and shows competitive results on Pascal VOC, Microsoft COCO, and KITTI datasets (including 6.9% mAP improvements over the R-FCN method on COCO test-dev dataset and the first place on both KITTI pedestrian and cyclist detection as of this submission).},

howpublished = {arXiv preprint},

keywords = {},

pubstate = {published},

tppubtype = {misc}

}

Chen, Zeyuan; Nie, Shaoliang; Wu, Tianfu; Healey, Christopher G

High Resolution Face Completion with Multiple Controllable Attributes via Fully End-to-End Progressive Generative Adversarial Networks Miscellaneous

arXiv preprint, 2018.

Abstract | Links | BibTeX

Chen, Zexi; Ramachandra, Bharathkumar; Wu, Tianfu; Vatsavai, Ranga Raju

Relational Long Short-Term Memory for Video Action Recognition Miscellaneous

arXiv preprint, 2018.

Abstract | Links | BibTeX

Lanka, Sameera; Wu, Tianfu

ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience Replay Workshop

NeurIPS 2018 Deep RL workshop, 2018.

Abstract | Links | BibTeX

2017

Wu, Tianfu; Lu, Yang; Zhu, Song-Chun

Online Object Tracking, Learning and Parsing with And-Or Graphs Journal Article

In: IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI), vol. 39, no. 12, pp. 2465–2480, 2017.

Abstract | Links | BibTeX

@article{TLP-PAMI,

title = {Online Object Tracking, Learning and Parsing with And-Or Graphs},

author = {Tianfu Wu and Yang Lu and Song-Chun Zhu},

url = {http://arxiv.org/abs/1509.08067 

 https://github.com/tfwu/RGM-AOGTracker},

doi = {10.1109/TPAMI.2016.2644963},

year  = {2017},

date = {2017-01-01},

urldate = {2017-01-01},

journal = {IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI)},

volume = {39},

number = {12},

pages = {2465--2480},

abstract = {This paper presents a method, called AOGTracker, for simultaneously tracking, learning and parsing (TLP) of unknown objects in video sequences with a hierarchical and compositional And-Or graph (AOG) representation. The TLP method is formulated in the Bayesian framework with a spatial and a temporal dynamic programming (DP) algorithms inferring object bounding boxes on-the-fly. During online learning, the AOG is discriminatively learned using latent SVM to account for appearance (e.g., lighting and partial occlusion) and structural (e.g., different poses and viewpoints) variations of a tracked object, as well as distractors (e.g., similar objects) in background. Three key issues in online inference and learning are addressed: (i) maintaining purity of positive and negative examples collected online, (ii) controling model complexity in latent structure learning, and (iii) identifying critical moments to re-learn the structure of AOG based on its intrackability. The intrackability measures uncertainty of an AOG based on its score maps in a frame. In experiments, our AOGTracker is tested on two popular tracking benchmarks with the same parameter setting: the TB-100/50/CVPR2013 benchmarks, and the VOT benchmarks --- VOT 2013, 2014, 2015 and TIR2015 (thermal imagery tracking). In the former, our AOGTracker outperforms state-of-the-art tracking algorithms including two trackers based on deep convolutional network. In the latter, our AOGTracker outperforms all other trackers in VOT2013 and is comparable to the state-of-the-art methods in VOT2014, 2015 and TIR2015.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Zhao, Bo; Wu, Botong; Wu, Tianfu; Wang, Yizhou

Zero-Shot Learning Posed as a Missing Data Problem Workshop

2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22-29, 2017, 2017.

Abstract | Links | BibTeX

2016

Li, Yunzhu; Sun, Benyuan; Wu, Tianfu; Wang, Yizhou

Face Detection with End-to-End Integration of a ConvNet and a 3D Model Proceedings Article

In: Proceedings of The 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, October 11-14, 2016.

Abstract | Links | BibTeX

@inproceedings{FaceDet-ConvNet-3D,

title = {Face Detection with End-to-End Integration of a ConvNet and a 3D Model},

author = {Yunzhu Li and Benyuan Sun and Tianfu Wu and Yizhou Wang},

url = {http://arxiv.org/abs/1606.00850 

 https://github.com/tfwu/FaceDetection-ConvNet-3D},

year  = {2016},

date = {2016-01-01},

booktitle = {Proceedings of The 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, October 11-14},

abstract = {This paper presents a method for face detection in the wild, which integrates a ConvNet and a 3D mean face model in an end-to-end multi-task discriminative learning framework. The 3D mean face model is predefined and fixed (e.g., we used the one provided in the AFLW dataset). The ConvNet consists of two components: (i) The face pro- posal component computes face bounding box proposals via estimating facial key-points and the 3D transformation (rotation and translation) parameters for each predicted key-point w.r.t. the 3D mean face model. (ii) The face verification component computes detection results by prun- ing and refining proposals based on facial key-points based configuration pooling. The proposed method addresses two issues in adapting state- of-the-art generic object detection ConvNets (e.g., faster R-CNN) for face detection: (i) One is to eliminate the heuristic design of prede- fined anchor boxes in the region proposals network (RPN) by exploit- ing a 3D mean face model. (ii) The other is to replace the generic RoI (Region-of-Interest) pooling layer with a configuration pooling layer to respect underlying object structures. The multi-task loss consists of three terms: the classification Softmax loss and the location smooth l1 -losses [14] of both the facial key-points and the face bounding boxes. In ex- periments, our ConvNet is trained on the AFLW dataset only and tested on the FDDB benchmark with fine-tuning and on the AFW benchmark without fine-tuning. The proposed method obtains very competitive state-of-the-art performance in the two benchmarks.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Li, Bo; Wu, Tianfu; Shao, Shuai; Zhang, Lun; Chu, Rufeng

Object Detection via End-to-End Integration of Aspect Ratio and Context Aware Part-based Models and Fully Convolutional Networks Miscellaneous

arXiv preprint, 2016.

Abstract | Links | BibTeX

@misc{ARC-FCN,

title = {Object Detection via End-to-End Integration of Aspect Ratio and Context Aware Part-based Models and Fully Convolutional Networks},

author = {Bo Li and Tianfu Wu and Shuai Shao and Lun Zhang and Rufeng Chu},

url = {https://arxiv.org/abs/1612.00534},

year  = {2016},

date = {2016-01-01},

journal = {CoRR},

volume = {abs/1612.00534},

abstract = {This paper presents a framework of integrating a mixture of part-based models and region-based convolutional networks for accurate and efficient object detection. Each mixture component consists of a small number of parts accounting for both object aspect ratio and contextual information explicitly. The mixture is category-agnostic for the simplicity of scaling up in applications. Both object aspect ratio and context have been extensively studied in traditional object detection systems such as the mixture of deformable part-based models [13]. They are, however, largely ignored in deep neural network based detection systems [17, 16, 39, 8]. The proposed method addresses this issue in two-fold: (i) It remedies the wrapping artifact due to the generic RoI (region-of-interest) pooling (e.g., a 3 x 3 grid) by taking into account object aspect ratios. (ii) It models both global (from the whole image) and local (from the surrounding of a bounding box) context for improving performance. The integrated framework is fully convolutional and enjoys end-to-end training, which we call the aspect ratio and context aware fully convolutional network (ARC-FCN). In experiments, ARC-FCN shows very competitive results on the PASCAL VOC datasets, especially, it outperforms both Faster R-CNN [39] and R-FCN [8] with significantly better mean average precision (mAP) using larger value for the intersection-over-union (IoU) threshold (i.e., 0.7 in the experiments). ARC-FCN is still sufficiently efficient with a test-time speed of 380ms per image, faster than the Faster R-CNN but slower than the R-FCN.},

howpublished = {arXiv preprint},

keywords = {},

pubstate = {published},

tppubtype = {misc}

}

Chen, Diqi; Wang, Yizhou; Wu, Tianfu; Gao, Wen

Recurrent Attentional Model for No-Reference Image Quality Assessment Miscellaneous

arXiv preprint, 2016.

Abstract | Links | BibTeX