Please enable JavaScript.
Coggle requires JavaScript to display documents.
Video Classification Papers that try to improve performance on benchmark…
Video Classification Papers that try to improve performance on benchmark models. All papers are connected in that they cite each other and compete with each other every year.
-
Carreira and Zisserman (2017), Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Kinetics Dataset
-
-
Used for pre-training models and testing effectiveness of transfer learning for other benchmark datasets.
ConvNet + LSTM
Approach based on using image features from state-of-the-art image classification models like inception and then using Recurrent Neural Networks like LSTM to model temporal information
-
-
-
-
-
Model Evaluation
-
-
Two-Stream Inflated 3D ConvNets proved to be the best performing model across all of the test scenarios
-
Du Tran et al (2018), A Closer Look at Spatiotemporal Convolutions for Action Recongition
Mixed 3D 2D ConvNets
MCx
Based on the hypothesis that motion modeling is only needed on the early layers of the network and that deeper layers need spatial information
-
-
rMCx
Same intuition with MCx but reversed. Early layers have 2D convolutions while last layers have 3D convolutions
-
(2+1)D ConvNets
-
-
Given the same number of layers as a 3D ConvNet - assuming that a (2+1)D layer counts the same as a singular 3D layer - (2+1)D ConvNets doubles non-linearity while retaining the same number of parameters
-
Has lower training and testing error than a 3D model, implying that it is easier to optimize
3D ConvNets
Vanilla 3D convolution equivalent of a 2D ResNet model. Similar to 2D ResNet model but instead of 2D convolutions, it has 3D convolutions
-
-
Input preprocessing
Randomly crop clips into 8 x 112 x 112 to have more input samples and to have spatial and temporal jittering
-
Randomly choose 5 2-second long clips for every video. Video classification is performed by averaging performance on these clips
-
-
Model Training
-
Trained and evaluated performance of all examined models on both Sports 1M dataset and Kinetics dataset
-
All models examined in this research are all based on the ResNet architecture as it is the state-of-the-art architecture on image classificaiton
-
-
-
-
-