Online citations, reference lists, and bibliographies.

Deep Residual Learning For Image Recognition

Kaiming He, X. Zhang, Shaoqing Ren, Jian Sun
Published 2016 · Computer Science

Cite This
Download PDF
Analyze on Scholarcy
Share
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers - 8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
This paper references
A multigrid tutorial
W. Briggs (1987)
10.2307/2685660
Modern Applied Statistics with S-Plus
Karen Kafadar (1999)
10.1109/CVPR.2015.7298594
Going deeper with convolutions
Christian Szegedy (2015)
Visualizing and Understanding Convolutional Neural Networks
Matthew D. Zeiler (2013)
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan (2015)
Deeplysupervised nets
C.-Y. Lee (2014)
Network In Network
M. Lin (2014)
10.1109/ICCV.2015.135
Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model
Spyros Gidaris (2015)
10.1145/3065386
ImageNet classification with deep convolutional neural networks
A. Krizhevsky (2017)
Deeply-Supervised Nets
Chen-Yu Lee (2015)
10.1109/TPAMI.2016.2577031
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren (2015)
10.3929/ETHZ-A-004263473
Accelerated Gradient Descent by Factor-Centering Decomposition
N. N. Schraudolph (1998)
10.1007/s11263-015-0816-y
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky (2015)
10.1016/S0065-2458(08)60404-0
Neural Networks for Pattern Recognition
S. Kothari (1993)
10.1016/j.arcped.2012.01.013
[Et al].
P. Cochat (2012)
Untersuchungen zu dynamischen neuronalen netzen. Diploma thesis
S Hochreiter (1991)
10.1162/neco.1989.1.4.541
Backpropagation Applied to Handwritten Zip Code Recognition
Y. LeCun (1989)
10.1109/CVPR.2007.383266
Fisher Kernels on Visual Vocabularies for Image Categorization
F. Perronnin (2007)
Pattern Recognition and Neural Networks
Y. LeCun (1995)
10.1109/ICCV.2015.169
Fast R-CNN
Ross B. Girshick (2015)
Deep Residual Learning for Image Recognition Supplementary Materials
Kaiming He (2016)
Maxout Networks
Ian J. Goodfellow (2013)
Understanding the difficulty of training deep feedforward neural networks
Xavier Glorot (2010)
10.1137/1.9780898719505
A multigrid tutorial, Second Edition
W. Briggs (2000)
10.1007/978-3-319-10578-9_23
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Kaiming He (2015)
10.1145/2647868.2654889
Caffe: Convolutional Architecture for Fast Feature Embedding
Y. Jia (2014)
10.1109/TPAMI.2010.57
Product Quantization for Nearest Neighbor Search
H. Jégou (2011)
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton (2012)
10.1162/neco.1997.9.8.1735
Long Short-Term Memory
S. Hochreiter (1997)
10.1007/978-3-642-35289-8_14
Centering Neural Network Gradient Factors
N. N. Schraudolph (2012)
Common objects in context
M. Maire (1998)
10.1007/s11263-009-0275-4
The Pascal Visual Object Classes (VOC) Challenge
M. Everingham (2009)
Rectified Linear Units Improve Restricted Boltzmann Machines
V. Nair (2010)
10.1109/TPAMI.2016.2601099
Object Detection Networks on Convolutional Feature Maps
Shaoqing Ren (2017)
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Andrew M. Saxe (2014)
10.1007/978-3-642-35289-8_3
Efficient BackProp
Y. LeCun (2012)
Deep Learning Made Easier by Linear Transformations in Perceptrons
T. Raiko (2012)
ICCV
R Girshick (2015)
A Multigrid Tutorial. Siam
W L Briggs (2000)
10.1145/1179352.1142005
Locally adapted hierarchical basis preconditioning
R. Szeliski (2006)
Highway Networks
R. Srivastava (2015)
10.1109/72.279181
Learning long-term dependencies with gradient descent is difficult
Yoshua Bengio (1994)
FitNets: Hints for Thin Deep Nets
A. Romero (2015)
Overfeat: Integrated recognition
P. Sermanet (2014)
10.1109/CVPR.2015.7299173
Convolutional neural networks at constrained time cost
Kaiming He (2015)
On the Number of Linear Regions of Deep Neural Networks
Guido Montúfar (2014)
10.1109/TPAMI.2016.2572683
Fully Convolutional Networks for Semantic Segmentation
Evan Shelhamer (2017)
10.1109/34.56188
Fast Surface Interpolation Using Hierarchical Basis Functions
R. Szeliski (1990)
10.1109/TPAMI.2011.235
Aggregating Local Image Descriptors into Compact Codes
H. Jégou (2012)
Untersuchungen zu dynamischen neuronalen Netzen
S. Hochreiter (1991)
Training Very Deep Networks
R. Srivastava (2015)
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
S. Ioffe (2015)
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
Pierre Sermanet (2014)
10.1061/(ASCE)GT.1943-5606.0001284
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky (2012)
10.1145/1873951.1874249
Vlfeat: an open and portable library of computer vision algorithms
A. Vedaldi (2010)
10.1007/978-3-642-42054-2_55
Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities
Tommi Vatanen (2013)
10.1109/ICCV.2015.123
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He (2015)
10.1007/978-3-319-10602-1_48
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin (2014)
Learning Multiple Layers of Features from Tiny Images
A. Krizhevsky (2009)
10.1109/CVPR.2014.81
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Ross B. Girshick (2014)
Common objects in context
S. Belongie (1998)
10.5244/C.25.76
The devil is in the details: an evaluation of recent feature encoding methods
K. Chatfield (2011)



This paper is referenced by
10.3390/electronics8090959
Ship Target Detection Algorithm Based on Improved Faster R-CNN
Liang Qi (2019)
Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption
X. Sun (2020)
Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs
Robin Rombach (2020)
HMCNAS: Neural Architecture Search using Hidden Markov Chains and Bayesian Optimization
Vasco Lopes (2020)
10.1109/CVPRW50498.2020.00174
Robust Semantic Segmentation by Redundant Networks With a Layer-Specific Loss Contribution and Majority Vote
Andreas Bär (2020)
10.1109/HORA49412.2020.9152833
Real-time Inferencing and Training of Artificial Neural Network for Adaptive Latency Negation in Distributed Virtual Environments
Gregory Gutmann (2020)
Implicit Regularization in Deep Learning: A View from Function Space
Aristide Baratin (2020)
Stylized Adversarial Defense
Muzammal Naseer (2020)
10.1109/IGESSC47875.2019.9042394
Deep Convolutional Neural Networks for Shark Behavior Analysis
Wenlu Zhang (2019)
10.1145/3360773.3360880
A Computer Vision Framework for Human User Sensing in Public Open Spaces
Peng Sun (2019)
Corner Proposal Network for Anchor-free, Two-stage Object Detection
Kaiwen Duan (2020)
WaveFuse: A Unified Deep Framework for Image Fusion with Wavelet Transform
Shaolei Liu (2020)
HATNet: An End-to-End Holistic Attention Network for Diagnosis of Breast Biopsy Images
Sachin Mehta (2020)
10.3390/rs12101660
Mixed 2D/3D Convolutional Network for Hyperspectral Image Super-Resolution
Qiang Li (2020)
10.1109/ICASSP40776.2020.9054392
Learning Geometric Features with Dual–stream CNN for 3D Action Recognition
Thien Huynh-The (2020)
10.1109/ICASSP40776.2020.9053893
Detecting Multiple Speech Disfluencies Using a Deep Residual Network with Bidirectional Long Short-Term Memory
Tedd Kourkounakis (2020)
10.1109/ICASSP40776.2020.9053115
Sight to Sound: An End-to-End Approach for Visual Piano Transcription
A. Sophia Koepke (2020)
10.1109/TIP.2020.3009034
Long-Term Action Dependence-Based Hierarchical Deep Association for Multi-Athlete Tracking in Sports Videos
Longteng Kong (2020)
Imaging of the fish embryo model and applications to toxicology. (Imagerie du modèle embryon de poisson : application à la toxicologie du développement)
Diane Genest (2019)
10.1109/TVT.2020.3007894
Complex ResNet Aided DoA Estimation for Near-Field MIMO Systems
Yashuai Cao (2020)
10.1109/tpami.2020.3010886
GarNet++: Improving Fast and Accurate Static3D Cloth Draping by Curvature Loss
Erhan Gundogdu (2020)
XMixup: Efficient Transfer Learning with Auxiliary Samples by Cross-domain Mixup
Xingjian Li (2020)
Adaptive Integration of Multiple Fine-tuning Models in Transfer Learning for Image Classification
Yu WANG (2020)
Reinforced External Guidance for Theorem Provers
Michael Rawson (2020)
10.1109/WSAI49636.2020.9143282
Nonlinear Activation in Deep Residual Networks
Qi-jun Zhang (2020)
10.1109/ICASSP40776.2020.9053362
Label Reuse for Efficient Semi-Supervised Learning
Tsung-Hung Hsieh (2020)
10.1109/ACCESS.2020.3007512
Skin Lesion Segmentation Based on Multi-Scale Attention Convolutional Neural Network
Yun Jiang (2020)
10.1109/ICAIBD49809.2020.9137469
The Implementation of A Crop Diseases APP Based on Deep Transfer Learning
Mengji Yang (2020)
10.1109/CSCITA47329.2020.9137802
Encoder-Decoder Architecture for Image Caption Generation
Harshit Parikh (2020)
10.1109/ICCES48766.2020.9137855
Deep learning based Surgical Workflow Recognition from Laparoscopic Videos
Elizebeth Kurian (2020)
Self-Supervised Nuclei Segmentation in Histopathological Images Using Attention
Mihir Sahasrabudhe (2020)
10.1016/j.imavis.2020.103981
An Attention-Based Deep Learning Model for Multiple Pedestrian Attributes Recognition
Ehsan Yaghoubi (2020)
See more
Semantic Scholar Logo Some data provided by SemanticScholar