Re-se-arch

Our re-se-arch has been generously supported by ARO, NSF, ARFL, IARPA, BlueHalo and Salesforce.

Our current re-se-arch interests mainly focus on:

(i) Grammar-Guided Interpretable and Robust Representation Learning. This line of research is motivated by “the belief that thinking of all kinds requires grammars” and “Grammar in language is merely a recent extension of much older grammars that are built into the brains of all intelligent animals to analyze sensory input, to structure their actions and even formulate their thoughts.” — Professor David Mumford.

(ii) Deep Consensus Lifelong Learning for Joint Discriminative and Generative Modeling. The world is highly structural with complex compositional regularities. To facilitate developing a unified AI ALTERing (Ask, Learn, Test, Explain and Refine) framework, on top of the research in (i), this line of research is to address one grand challenge in computer vision and machine (deep) learning, that is to model and learn the joint distribution of Grammar-like structures and raw data, p(structures, data), in a principled way. It typically consists of two tasks: structured output prediction that aims to learn p(structures | data) (e.g. image semantic segmentation or image parsing), and structured input synthesis that aims to learn p(data | structures), i.e., controllable and reconfigurable conditional generative learning (e.g., text/layout-to-image synthesis), or AIGC emerged more recently. Deep consensus lifelong learning aims to integrate them in a closed loop for AI ALTERing and AIGCGT (AI Generated Content and Ground-Truth).

2022

Grainger, Ryan; Paniagua, Thomas; Song, Xi; Wu, Tianfu

Learning Patch-to-Cluster Attention in Vision Transformer Working paper

arXiv preprint, 2022.

Abstract | Links | BibTeX

2016

Li, Bo; Wu, Tianfu; Shao, Shuai; Zhang, Lun; Chu, Rufeng

Object Detection via End-to-End Integration of Aspect Ratio and Context Aware Part-based Models and Fully Convolutional Networks Miscellaneous

arXiv preprint, 2016.

Abstract | Links | BibTeX

@misc{ARC-FCN,

title = {Object Detection via End-to-End Integration of Aspect Ratio and Context Aware Part-based Models and Fully Convolutional Networks},

author = {Bo Li and Tianfu Wu and Shuai Shao and Lun Zhang and Rufeng Chu},

url = {https://arxiv.org/abs/1612.00534},

year  = {2016},

date = {2016-01-01},

journal = {CoRR},

volume = {abs/1612.00534},

abstract = {This paper presents a framework of integrating a mixture of part-based models and region-based convolutional networks for accurate and efficient object detection. Each mixture component consists of a small number of parts accounting for both object aspect ratio and contextual information explicitly. The mixture is category-agnostic for the simplicity of scaling up in applications. Both object aspect ratio and context have been extensively studied in traditional object detection systems such as the mixture of deformable part-based models [13]. They are, however, largely ignored in deep neural network based detection systems [17, 16, 39, 8]. The proposed method addresses this issue in two-fold: (i) It remedies the wrapping artifact due to the generic RoI (region-of-interest) pooling (e.g., a 3 x 3 grid) by taking into account object aspect ratios. (ii) It models both global (from the whole image) and local (from the surrounding of a bounding box) context for improving performance. The integrated framework is fully convolutional and enjoys end-to-end training, which we call the aspect ratio and context aware fully convolutional network (ARC-FCN). In experiments, ARC-FCN shows very competitive results on the PASCAL VOC datasets, especially, it outperforms both Faster R-CNN [39] and R-FCN [8] with significantly better mean average precision (mAP) using larger value for the intersection-over-union (IoU) threshold (i.e., 0.7 in the experiments). ARC-FCN is still sufficiently efficient with a test-time speed of 380ms per image, faster than the Faster R-CNN but slower than the R-FCN.},

howpublished = {arXiv preprint},

keywords = {},

pubstate = {published},

tppubtype = {misc}

}