Focal Loss
Loss function
dense object detectors
factor (1 − pt)γ
standard cross entropy criterion
more focus on hard, misclassified examples
easy background examples
object detectors
two-stage approach
sparse set of candidate object locations
one-stage detectors
regular, dense sampling of possible object locations
have the potential to be faster and simple
have trailed the accuracy of two-stage detectors thus far
extreme foreground-background class imbalance
why?
extreme foreground-background class imbalance encountered
during training of dense detectors is the central cause
Class imbalance
Address
reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples
downweight wellclassified examples
focus training on a sparse set of hard examples
prevents the vast number of easy negatives from overwhelming
the detector during training
Evaluate
To evaluate the effectiveness of our loss
RetinaNet
a simple dense detector
able to match the speed of previous one-stage detectors
while surpassing the accuracy of
all existing state-of-the-art two-stage detectors
click to edit
state-of-the-art
two-stage, proposal-driven mechanism
first stage generates a sparse set of candidate object locations
As popularized in the R-CNN framework,
second stage classifies each candidate location as one of the foreground classes or as background using a convolutional neural network
Through a sequence of advances....papers
COCO benchmark
this
two-stage framework consistently achieves top accuracy on
the challenging COCO benchmark
A question
Could a simple one-stage detector achieve similar accuracy?
One stage detectors are applied over a regular, dense sampling of object locations, scales, and aspect ratios
Recent work
YOLO
SSD
promising results, yielding faster detectors with accuracy within 10-40% relative to state-of-the-art two-stage method
This paper pushes the envelop further
present a onestage object detector that, for the first time, matches the state-of-the-art COCO AP of more complex two-stage detectors, such as the Feature Pyramid Network (FPN) or Mask R-CNN variants of Faster R-CNN
R-CNN
Feature Pyramid Network (FPN)
Mask R-CNN
Faster R-CNN
To achieve this result, we identify class imbalance during training as the main obstacle impeding one-stage detector from achieving state-of-the-art accuracy and propose a new loss function that eliminates this barrier
Address in R-CNN-like detectors
two-stage cascade & sampling heuristics
The proposal stage
DeepMask
RPN
.... rapidly narrows down the number of candidate object locations to a small number (e.g., 1-2k), filtering out most background samples
Selective Search
EdgeBoxes
Second classification stage
sampling heuristics
fixed foreground-to-background ratio (1:3)
online hard example mining (OHEM)
are performed to maintain a manageable balance between foreground and background