Computer Vision
Image Processing
Image Filters
Linear
Feature Detection
Edges and Contour detection
Canny Edge Detection
Local Features
Scale Invariant Feature Transform (SIFT)
Non-maximum suppression - thin wide ridges down to single pixel width
Hysterisis thresholding - define low & high thresholds, high to start edge curves, low to continue them
- Filter
- Select
- Compute
Difference of Gaussian
Autocorrelation
Histogram of gradient orientations
Threshold
Measurements relative to dominant orientation
- Descriptor
4x4 array of 8 bin histograms
128 negative numbers for each descriptor
Gaussian filter
Feature Matching
Nearest Neighbour
Point Correspondences
Faces
Detection
Recognition
Disadvantages
Intra-class variability
Inter-class confusion
Algorithm
Rectangular HAAR like features provide a description of the window features of the image
Viloa & Jones Detection Algorithm
Integral images allow fast computation of rectangle features
AdaBoost is used for feature selection and classification
Cascade of classifiers
The classifiers become progressively more complex and have lower FP rates
Eigen Faces
(compression-based feature-extraction)
Prewitt and Sobel filter
Principal Component Analysis is used to learn a set of basis component faces based on degrees of variance
Linear Discriminant Analysis (LDA)
Normalization
A face can be treated as a combination of component faces
Eigenfaces can be differently weighted to represent any face
Recognition
- Project unknown normalized face image into face space
- Compute the Euclidean distance D between the projection and all the projected training faces
- Closest face in the feature space considered a match or use KNN to predict class membership based on k closest samples
LDA starts with criteria based on discrimination from the beginning
LDA is a linear supervised feature extraction technique
LDA seeks a new data representation, where points of the same class are as close as possible, while points of different classes are as far as possible
LDA outperforms PCA when training set is large. Often use PCA then LDA.
FisherFaces
LDA Dimensionality reduction
Image Segmentation
Most features match
Panorama Creation
Most features do not match
Object recognition
Bottom Up
Top Down
Clustering
Gestalt factors
Factors that determine whether elements should be grouped together
Familiar configuration
Subjective contours
Occlusion
Cluster similar pixels together
Construct a vector of color and spatial coordinates
Clustering Techniques
Agglomerative
K-Means
Mean-shift
Filter image with Derivative Of Gaussian
Find magnitude and orientation of gradient
Snakes (active contour models)
Given an initial contour near desired object, evolve the contour to fit the exact object boundary, like an Elastic band around a solid
RANSAC
- Estimate the model for the random subset
- Count the number of inliers that are within x of their predicted location
- Repeat the random selection process and keep the sample with the largest number of inliers
Energy functions
Internal
Externel
Energy to encourage prior shape preferences.
Energy pushing "out" on elastic band
Energy to encourage contour to fit on places where structures exist, Gradient-Based
Energy "tightening" elastic band
Tension, Elasticity
Stiffness, Curvature
Balloon force
Shape Prior
Idea is to minimize total energy (Einternal + Eextermal)
Disadvantages
Depends on number and spacing of control points
Snake may over-smooth the boundary
The curve may intersect itself
Can not follow topological changes of objects (e.g. splitting, merging)
Advantages
May get stuck in local minimum
Parameters of energy function must be set well based on prior information
Good for non-rigid deformable objects: lips, hands, e.t.c
Useful to track and fit non-rigid shapes
Contour remains connected
Possible to fill in subjective contours
Flexible energy function definition
Level Sets (geodesic active contours)
Key Idea
Evolve the curve outwards with a speed depending of the curvature and the image
Elastic band is iteratively adjusted so as to:
- be near image positions with high gradients
- satisfy shape preferences of contour priors
Quickly expand when passing over places with small image gradient
Slow down when crossing large image gradient places
Algorithm
Initialize windows at feature points
Perform mean shift for each window until convergence
Merge windows that end up at the same peak or mode
Find features
Advantages
Does not assume spherical clusters
Just a single parameter
Finds variable number of modes
Robust to outliers
Generic technique
Disadvantages
Output depends on window size
Computationally expensive
LEGEND
Algorithm / Method
Advantages
Disadvantages
- Select a random subset of k correspondences
allows to obtain low FN rate in real-time
Comparison to Snakes
Curve may change topology and form sharp corners
Intrinsic geometric properties are easily determined
2D and 3D formulation are the same
Mumford-Shah
Function that captures how well reconstructed a function matches the original image
Chan-Vese
Segments the image using model of interior and exterior
Initial contour must be completely contained within object
Heaviside step function is approximately 1 inside the contour, and 0 outside
Level set function is used to represent the curve
Allows topologically invariant segmentation
Independent on the initialization
Assume interior and exterior both have approximately constant intensity
Video Segmentation
Shot Detection
Background subtraction
Video Shots
Camera Break, abrupt change between neighbouring frames
Gradual Transition over a set of consecutive frames
Shot Breaks
Hard Cut
Fade
Dissolve
Wipe
Detection Techniques
Find shots that differ and declare as shot-break
Cluster similar frames to identify as shots
Pixel comparison
Block-based approach
Histogram comparison
Edge change ratio
More accurate than pixel-wise approach
Compare number of occurrences for each color between subsequent frames
No color information needed
Compare number of entering edge and exiting edge pixels
Better than other methods at detecting fade, wipe, dissolve and hard-cut
Thresholding
Calculate a time series function of discontinuity feature values for each frame.
Pick cuts positions from the discontinuities function based on some thresholds
Global Threshold
Adaptive Threshold
A hard cut is declared every time the discontinuity value surpasses a global threshold
A soft cut is detected based on the difference of the current discontinuity values from its local neighbourhood
Hardcut declared when discontinuity value is maximum in neighbourhood
Simple, easy to implement
Computationally Heavy
Sensitive to camera motion
Performs better than pixel approach
Cant identify dissolve, fade or fast moving objects
Detects hard-cut, fade, wipe and dissolve
Fails if two successive shots have the same histogram
Cant distinguish fast object or camera motion
Computationally heavy
Fails when there is large amount of motion
Method
- Moving average to estimate background image (using the Median)
- Subtract estimate from current frame
- Large absolute values are interesting pixels, defined by a threshold
- Use morphological operations to clean up pixels
Issues
Static camera - Changed and unchanged regions
Moving camera - global and local motion regiions
Apperture problem
Occlusion problem
Erroneous labels may be assigned to pixels in covered or uncovered regions
Pixels in a flat image region may appear stationary even if they are moving
Advantages
Simple & Fast
Disadvantages
Needs many images
Needs static background
Noisy segmentation
Texture Analysis
What is a texture?
When the uniformity of a region is perceived as a series of variations in the intensity
Groups
Repetetive / Structured
Stochastic
Mixed
Texture discrimination
Types
Natural
Weakly homogeneous
Artificial regular textures
Stochastic textures
Assumptions
Texture is a property of regions
Texture is a contextual property
Texture can be perceived at different scales or levels of resolution
A region presents a texture when the number of primitive patterns is large
Finding patterns
- Use filters that look like patterns
Filter Bank
- Consider magnitude of response
Describe their statistics within each local window
Mean std deviation, histogram
Collection of gaussian and linear filters
Using mean absolute response
Object Recognition
Bag-Of-Words Model
Method
Quantize features
- Use a distance based classifier (KNN / SVM) to classify new images
Learn Vocabulary (Train)
- Train classifier by counting occurence of features in labelled images
An orderless document representation: Frequencies of words from a dictionary
Feature extraction
- Generate an unlabelled vocabulary of frequent features
Dense SIFT: PHOW features
Pyramid Histogram Of Words
Dense SIFT - Faster version of SIFT algorithm
4 Scales, uniform spacing
Clustering the feature descriptors with K-Means
Each word in the vocab represents the centre of a cluster
Vocabulary size
Too small - not representative
Too large - quantization artifacts, overfitting
Using a spatial histogram
Apply a SVM classifier
Effective in high dimensional spaces
Can use a custom kernel map
Computation time
No direct multi-class svm
Linear / additive / additive RBG kernels
Noise
Salt & Pepper
Impulse
Gaussian
Box filter
- Extracting haar like features
2.Training model
- Classify