North Carolina State University researchers have developed a new framework for building deep neural networks via grammar-guided network generators. In experimental testing, the new networks – called AOGNets – have outperformed existing state-of-the-art frameworks, including the widely-used ResNet and DenseNet systems, in visual recognition tasks.
“AOGNets have better prediction accuracy than any of the networks we’ve compared it to,” says Tianfu Wu, an assistant professor of electrical and computer engineering at NC State and corresponding author of a paper on the work. “AOGNets are also more interpretable, meaning users can see how the system reaches its conclusions.”
The new framework uses a compositional grammar approach to system architecture that draws on best practices from previous network systems to more effectively extract useful information from raw data.
“We found that hierarchical and compositional grammar gave us a simple, elegant way to unify the approaches taken by previous system architectures, and to our best knowledge, it is the first work that makes use of grammar for network generation,” Wu says.
To test their new framework, the researchers developed AOGNets and tested them against three image classification benchmarks: CIFAR-10, CIFAR-100 and ImageNet-1K.
“AOGNets obtained significantly better performance than all of the state-of-the-art networks under fair comparisons, including ResNets, DenseNets, ResNeXts and DualPathNets,” Wu says. “AOGNets also obtained the best model interpretability score using the network dissection metric in ImageNet. AOGNets further show great potential in adversarial defense and platform-agnostic deployment (mobile vs cloud).”
The researchers also tested the performance of AOGNets in object detection and instance semantic segmentation, on the Microsoft COCO benchmark, using the vanilla Mask R-CNN system.
“AOGNets obtained better results than the ResNet and ResNeXt backbones with smaller model sizes and similar or slightly better inference time,” Wu says. “The results show the effectiveness of AOGNets learning better features in object detection and segmentation tasks.
These tests are relevant because image classification is one of the core basic tasks in visual recognition, and ImageNet is the standard large-scale classification benchmark. Similarly, object detection and segmentation are two core high-level vision tasks, and MS-COCO is one of the most widely used benchmarks.
“To evaluate new network architectures for deep learning in visual recognition, they are the golden testbeds,” Wu says. “AOGNets are developed under a principled grammar framework and obtain significant improvement in both ImageNet and MS-COCO, thus showing potentially broad and deep impacts for representation learning in numerous practical applications.
“We’re excited about the grammar-guided AOGNet framework, and are exploring its performance in other deep learning applications, such as deep natural language understanding, deep generative learning and deep reinforcement learning,” Wu says.
The paper, “AOGNets: Compositional Grammatical Architectures for Deep Learning,” will be presented at the IEEE Computer Vision and Pattern Recognition Conference, being held June 16-20 in Long Beach, Calif. First author of the paper is Xilai Li, a Ph.D. student at NC State. The paper was co-authored by Xi Song, an independent researcher.
The work was done with support from the U.S. Army Research Office under grants W911NF1810295 and W911NF1810209.
A patent application is submitted for the work. The authors are interested in collaborating with potential academic and industry partners.
Note to Editors: The study abstract follows.
“AOGNets: Compositional Grammatical Architectures for Deep Learning”
Authors: Xilai Li and Tianfu Wu, North Carolina State University; Xi Song, independent researcher
Presented: June 16-20, IEEE Computer Vision and Pattern Recognition Conference (CVPR) in Long Beach, Calif.
Abstract: Neural architectures are the foundation for improving performance of deep neural networks (DNNs). This paper presents deep compositional grammatical architectures which harness the best of two worlds: grammar models and DNNs. The proposed architectures integrate compositionality and reconfigurability of the former and the capability of learning rich features of the latter in a principled way. We utilize AND-OR Grammar (AOG) as network generator in this paper and call the resulting networks AOGNets. An AOGNet consists of a number of stages each of which is composed of a number of AOG building blocks. An AOG building block splits its input feature map into N groups along feature channels and then treat it as a sentence of N words. It then jointly realizes a phrase structure grammar and a dependency grammar in bottom-up parsing the “sentence” for better feature exploration and reuse. It provides a unified framework for the best practices developed in state-of-the-art DNNs. In experiments, AOGNet is tested in the ImageNet-1K classification benchmark and the MS-COCO object detection and segmentation benchmark. In ImageNet-1K, AOGNet obtains better performance than ResNet and most of its variants, ResNeXt and its attention based variants such as SENet, DenseNet and DualPathNet. AOGNet also obtains the best model interpretability score using network dissection. AOGNet further shows better potential in adversarial defense. In MS-COCO, AOGNet obtains better performance than the ResNet and ResNeXt backbones in Mask R-CNN.
This post was originally published in NC State News.