Computer Vision

Image Processing

Image Filters

Linear

Feature Detection

Edges and Contour detection

Canny Edge Detection

Local Features

Scale Invariant Feature Transform (SIFT)

Non-maximum suppression - thin wide ridges down to single pixel width

Hysterisis thresholding - define low & high thresholds, high to start edge curves, low to continue them

  1. Filter
  1. Select
  1. Compute

Difference of Gaussian

Autocorrelation

Histogram of gradient orientations

Threshold

Measurements relative to dominant orientation

  1. Descriptor

4x4 array of 8 bin histograms
128 negative numbers for each descriptor

Gaussian filter

Feature Matching

Nearest Neighbour

Point Correspondences

Faces

Detection

Recognition

Disadvantages

Intra-class variability

Inter-class confusion

Algorithm

Rectangular HAAR like features provide a description of the window features of the image

Viloa & Jones Detection Algorithm

Integral images allow fast computation of rectangle features

AdaBoost is used for feature selection and classification

Cascade of classifiers

The classifiers become progressively more complex and have lower FP rates

Eigen Faces
(compression-based feature-extraction)

Prewitt and Sobel filter

Principal Component Analysis is used to learn a set of basis component faces based on degrees of variance

Linear Discriminant Analysis (LDA)

Normalization

A face can be treated as a combination of component faces

Eigenfaces can be differently weighted to represent any face

Recognition

  1. Project unknown normalized face image into face space
  1. Compute the Euclidean distance D between the projection and all the projected training faces
  1. Closest face in the feature space considered a match or use KNN to predict class membership based on k closest samples

LDA starts with criteria based on discrimination from the beginning

LDA is a linear supervised feature extraction technique

LDA seeks a new data representation, where points of the same class are as close as possible, while points of different classes are as far as possible

LDA outperforms PCA when training set is large. Often use PCA then LDA.

FisherFaces

LDA Dimensionality reduction

Image Segmentation

Most features match

Panorama Creation

Most features do not match

Object recognition

Bottom Up

Top Down

Clustering

Gestalt factors

Factors that determine whether elements should be grouped together

Familiar configuration

Subjective contours

Occlusion

Cluster similar pixels together

Construct a vector of color and spatial coordinates

Clustering Techniques

Agglomerative

K-Means

Mean-shift

Filter image with Derivative Of Gaussian

Find magnitude and orientation of gradient

Snakes (active contour models)

Given an initial contour near desired object, evolve the contour to fit the exact object boundary, like an Elastic band around a solid

RANSAC

  1. Estimate the model for the random subset
  1. Count the number of inliers that are within x of their predicted location
  1. Repeat the random selection process and keep the sample with the largest number of inliers

Energy functions

Internal

Externel

Energy to encourage prior shape preferences.
Energy pushing "out" on elastic band

Energy to encourage contour to fit on places where structures exist, Gradient-Based

Energy "tightening" elastic band

Tension, Elasticity

Stiffness, Curvature

Balloon force

Shape Prior

Idea is to minimize total energy (Einternal + Eextermal)

Disadvantages

Depends on number and spacing of control points

Snake may over-smooth the boundary

The curve may intersect itself

Can not follow topological changes of objects (e.g. splitting, merging)

Advantages

May get stuck in local minimum

Parameters of energy function must be set well based on prior information

Good for non-rigid deformable objects: lips, hands, e.t.c

Useful to track and fit non-rigid shapes

Contour remains connected

Possible to fill in subjective contours

Flexible energy function definition

Level Sets (geodesic active contours)

Key Idea

Evolve the curve outwards with a speed depending of the curvature and the image

Elastic band is iteratively adjusted so as to:

  • be near image positions with high gradients
  • satisfy shape preferences of contour priors

Quickly expand when passing over places with small image gradient

Slow down when crossing large image gradient places

Algorithm

Initialize windows at feature points

Perform mean shift for each window until convergence

Merge windows that end up at the same peak or mode

Find features

Advantages

Does not assume spherical clusters

Just a single parameter

Finds variable number of modes

Robust to outliers

Generic technique

Disadvantages

Output depends on window size

Computationally expensive

LEGEND

Algorithm / Method

Advantages

Disadvantages

  1. Select a random subset of k correspondences

allows to obtain low FN rate in real-time

Comparison to Snakes

Curve may change topology and form sharp corners

Intrinsic geometric properties are easily determined

2D and 3D formulation are the same

Mumford-Shah

Function that captures how well reconstructed a function matches the original image

Chan-Vese

Segments the image using model of interior and exterior

Initial contour must be completely contained within object

Heaviside step function is approximately 1 inside the contour, and 0 outside

Level set function is used to represent the curve

Allows topologically invariant segmentation

Independent on the initialization

Assume interior and exterior both have approximately constant intensity

Video Segmentation

Shot Detection

Background subtraction

Video Shots

Camera Break, abrupt change between neighbouring frames

Gradual Transition over a set of consecutive frames

Shot Breaks

Hard Cut

Fade

Dissolve

Wipe

Detection Techniques

Find shots that differ and declare as shot-break

Cluster similar frames to identify as shots

Pixel comparison

Block-based approach

Histogram comparison

Edge change ratio

More accurate than pixel-wise approach

Compare number of occurrences for each color between subsequent frames

No color information needed

Compare number of entering edge and exiting edge pixels

Better than other methods at detecting fade, wipe, dissolve and hard-cut

Thresholding

Calculate a time series function of discontinuity feature values for each frame.


Pick cuts positions from the discontinuities function based on some thresholds

Global Threshold

Adaptive Threshold

A hard cut is declared every time the discontinuity value surpasses a global threshold

A soft cut is detected based on the difference of the current discontinuity values from its local neighbourhood

Hardcut declared when discontinuity value is maximum in neighbourhood

Simple, easy to implement

Computationally Heavy

Sensitive to camera motion

Performs better than pixel approach

Cant identify dissolve, fade or fast moving objects

Detects hard-cut, fade, wipe and dissolve

Fails if two successive shots have the same histogram

Cant distinguish fast object or camera motion

Computationally heavy

Fails when there is large amount of motion

Method

  1. Moving average to estimate background image (using the Median)
  1. Subtract estimate from current frame
  1. Large absolute values are interesting pixels, defined by a threshold
  1. Use morphological operations to clean up pixels

Issues

Static camera - Changed and unchanged regions

Moving camera - global and local motion regiions

Apperture problem

Occlusion problem

Erroneous labels may be assigned to pixels in covered or uncovered regions

Pixels in a flat image region may appear stationary even if they are moving

Advantages

Simple & Fast

Disadvantages

Needs many images

Needs static background

Noisy segmentation

Texture Analysis

What is a texture?

When the uniformity of a region is perceived as a series of variations in the intensity

Groups

Repetetive / Structured

Stochastic

Mixed

Texture discrimination

Types

Natural

Weakly homogeneous

Artificial regular textures

Stochastic textures

Assumptions

Texture is a property of regions

Texture is a contextual property

Texture can be perceived at different scales or levels of resolution

A region presents a texture when the number of primitive patterns is large

Finding patterns

  1. Use filters that look like patterns

Filter Bank

  1. Consider magnitude of response

Describe their statistics within each local window

Mean std deviation, histogram

Collection of gaussian and linear filters

Using mean absolute response

Object Recognition

Bag-Of-Words Model

Method

Quantize features

  1. Use a distance based classifier (KNN / SVM) to classify new images

Learn Vocabulary (Train)

  1. Train classifier by counting occurence of features in labelled images

An orderless document representation: Frequencies of words from a dictionary

Feature extraction

  1. Generate an unlabelled vocabulary of frequent features

Dense SIFT: PHOW features

Pyramid Histogram Of Words

Dense SIFT - Faster version of SIFT algorithm

4 Scales, uniform spacing

Clustering the feature descriptors with K-Means

Each word in the vocab represents the centre of a cluster

Vocabulary size

Too small - not representative

Too large - quantization artifacts, overfitting

Using a spatial histogram

Apply a SVM classifier

Effective in high dimensional spaces

Can use a custom kernel map

Computation time

No direct multi-class svm

Linear / additive / additive RBG kernels

Noise

Salt & Pepper

Impulse

Gaussian

s&p

Box filter

  1. Extracting haar like features

2.Training model

  1. Classify