Home

Welcome to the iVMCL laboratory at NC State, led by Dr. Tianfu (Matt) Wu who is an associate professor of Electrical and Computer Engineering (ECE) at NC State and affiliated with the Visual Narrative Initiative. Our long-term research focuses on interpretable Visual Modeling, Computing and Learning, often motivated by the tasks of pursuing a unified framework for AI to ALTER (Ask, Learn, Test, Explain and Refine) in a trustworthy, robust and responsive way for AIGCGT (AI Generated Content and Ground-Truth).

Two papers accepted to CVPR’24

Xianpeng Liu, Ce Zheng, Ming Qian, Nan Xue, Chen Chen, Zhebin Zhang, Chen Li,Tianfu Wu , “Multi-View Attentive Contextualization for Multi-View 3D Object …

Two papers accepted to IEEE TPAMI

Holistically-Attracted Wireframe Parsing: From Supervised Learning to Self-Supervised Learning NOPE-SAC: Neural One-Plane RANSAC for Sparse-View Planar 3D Reconstruction…

Our current research interests mainly focus on:

(i) Grammar-Guided Interpretable and Robust Representation Learning . This line of research is motivated by “the belief that thinking of all kinds requires grammars” and “Grammar in language is merely a recent extension of much older grammars that are built into the brains of all intelligent animals to analyze sensory input, to structure their actions and even formulate their thoughts.” — Professor David Mumford.

Representative work: Neural Architecture Generators [AOGNets v1, AOGNets v2 (AttnNorm), iRCNN], Sparsely Activated and Interpretable Synthesis [AOGenerator], Wireframe Parsing [HAWP (FSL v1 & v2, SSL v3)], Contextual Adaptation [LOGO-CAP] , Patch/Token-to-Cluster Attention [PaCa-ViT], Adversarial Attacks [AdvDistill, CGBA]

Other related work include: AutoContext-RCNN, AFM, SPAP, LostGANs v1

(ii) Deep Consensus Lifelong Learning for Joint Discriminative and Generative Modeling. The world is highly structural with complex compositional regularities. To facilitate developing a unified AI ALTER framework, on top of the research in (i), this line of research is to address one grand challenge in computer vision and machine (deep) learning, that is to model and learn the joint distribution of Grammar-like structures and raw data, p(structures, data), in a principled way. It typically consists of two tasks:  structured output prediction that aims to learn p(structures | data) (e.g. image  semantic segmentation or image parsing), and structured input synthesis that aims to learn p(data | structures), i.e., controllable and reconfigurable conditional generative learning (e.g., text/layout-to-image synthesis), or AIGC emerged more recently. Deep consensus lifelong learning aims to integrate them in a closed loop for AI ALTERing and AIGCGT (AI Generated Content and Ground-Truth).

Representative work: Reconfigurable and Controllable Image Synthesis [LostGANs v2], Three-Player Consensus Learning [ DCL v1 ], Resilient Lifelong Learning [Learn-to-Grow (L2G) v1, v2 (ArtiHippo)], Auxiliarily Consensus Learning [MonoCon 3D Object Detection, MonoXiver], Deep Integrative Learning [MCTNets4Stereo]