Welcome to the iVMCL laboratory at NC State, led by Dr. Tianfu (Matt) Wu who is an associate professor of Electrical and Computer Engineering (ECE) at NC State and affiliated with the Visual Narrative Initiative. Our long-term research focuses on interpretable Visual Modeling, Computing and Learning, often motivated by the tasks of pursuing a unified framework for Artificial Intelligence (A.I.) to ALTER (Ask, Learn, Test, Explain and Refine) recursively in a principled way.
“PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers” accepted to CVPR’23
“Structure from Motion on Neural Level Set of Implicit Surfaces” accepted to CVPR’23
Our current research interests mainly focus on:
(i) Grammar-Guided Interpretable Representation Learning for Image Analysis and Synthesis. This line of research is motivated by “the belief that thinking of all kinds requires grammars” and “Grammar in language is merely a recent extension of much older grammars that are built into the brains of all intelligent animals to analyze sensory input, to structure their actions and even formulate their thoughts.” — Professor David Mumford.
Representative work: Neural Architecture Generators [AOGNets v1, AOGNets v2 (AttnNorm), iRCNN], Sparsely Activated and Interpretable Synthesis [AOGenerator], Wireframe Parsing [HAWP (FSL v1 & v2, SSL v3)], Contextual Adaptation [LOGO-CAP] , Patch/Token-to-Cluster Attention [PaCa-ViT]
Other related work include: AutoContext-RCNN, AFM, SPAP, LostGANs v1
(ii) Deep Consensus Lifelong Learning for Joint Discriminative and Generative Modeling of Structured Data. The world is highly structural with complex compositional regularities. To facilitate developing a unified AI ALTER framework, on top of the research in (i), this line of research is to address one grand challenge in computer vision and machine (deep) learning, that is to model and learn the joint distribution of Grammar-like structures and raw data, p(structures, data), in a principled way. It typically consists of two tasks: structured output prediction that aims to learn p(structures | data) (e.g. image semantic segmentation or image parsing), and structured input synthesis that aims to learn p(data | structures), i.e., controllable and reconfigurable conditional generative learning (e.g., text/layout-to-image synthesis), or AIGC emerged more recently. Deep consensus lifelong learning aims to integrate them in a closed loop for AI ALTERing and AIGCGT (AI Generated Content and Ground-Truth).
Representative work: Reconfigurable and Controllable Image Synthesis [LostGANs v2], Three-Player Consensus Learning [ DCL v1 ], Resilient Lifelong Learning [Learn-to-Grow (L2G) v1, v2 (ArtiHippo)], Auxiliarily Consensus Learning [MonoCon 3D Object Detection, MonoXiver], Deep Integrative Learning [MCTNets4Stereo]
Other related work: AdvDistill