Please enable JavaScript.
Coggle requires JavaScript to display documents.
COMP27112 Introduction to Visual Computing - Coggle Diagram
COMP27112 Introduction to Visual Computing
History of Visual Computing
Raster graphics displays
Uses a 2D array of pixels or dots, so images must be sampled
The more samples, the better the fidelity but it is always an approximation
In computer graphics, everything is an approximation but some approximations are better than others
The OpenGL API
This fits between the application program and the display
The order is application model, application program, OpenGL API, display/input devices, user
It is a specification of an Application Programmer's Interface, so a language in a set of functions for doing 3D computer graphics.
OpenGL evolution
v1 and v2 had a fixed pipeline and fixed functionality
v3+ has a programmable pipeline - extensible functionality where programmers write micro-programs called shaders
Basic graphics system architecture
This includes a CPU (with application program loaded) and a GPU with graphics software, framebuffer memory and DAC
The Graphics software holds the API with basics shapes, transformations, viewing, lighting, textures and rendering
The Graphics Pipeline
Fixed Functionality
The graphics system applies fixed algorithms in a fixed order. The application provides data and changes parameters via the API.
We go from 3D vertices, through transformations and viewing, lighting, primitive assembly: clipping, rasterisation and fragmentation operations to pixels
Programmable functionality
This pipeline is a mix of fixed and programmable functions.
We go from 3D vertices, through the vertex shader program, primitive assemby:clipping, rasterisation, the fragmenr shader program, fragment operations: hidden-surface removal to pixels
The vertex and fragment shader programs must be provided by the user, written in a shadinf language. The user has access to state in the system.
Vertex processor - coordinate transforms colous~
Clipper - how much of the scene is visible, produces liens
Rasteriser - convert primitives to pixels/shapes
Fragment processor - hidden surface removal, texturising etc
Fixed vs Programmable
The fixed pipeline
Cons
New algorithsm and techniques can't be added
It's depreciated
Pros
It's simple to use and fine for many purposes
For the beginner it's easy to get started quickly
The programmable pipeline
Pros
Provides maximum flexibility
It's state-of-the-art, cutting edge, new all the time
Con
For the beginner there is significant start-up cost
Open GL and interaction
OpenGL only generates pixels, so for interaction we use the GLUT library
Open GL graphics
The main features are
3D graphics
Lines have geometry (shape) and attribute (appearance) properties
Polygons i.e. triangles, quadrilaterals, convex polygons
coordinate transformations
A camera for viewing
hidden surface removal
lighting and shading
texturing
pixel operations
Support Libraries
GLU
GL Utility Library
Provides functions which wrap up lowe level OpenGL graphics
Viewing, Textures, Tessellation
GLUT
GL Utility Toolkit
Interaction (mouse an keyboard, menu system)
Primitives
Sphere, torus, cone, cube tetra|octo|dodeca|icosa-hedron, teapot
Transformations with Matrices and Vectors
Coordinates and Vectors
A coordinate represents a point in space, measured with respect to an origin and set of x, y, z axes. They are not necessarily fixed.
A vector represents a direction in space, with respect to a set of axes. It has a characteristic length.
Both coordinates and vectors can be represened by a triple of x, y, z values in a column(OpenGL) or row(MATLAB).
The two representations are equivalent, but a transformation matrix used with column vectors is the transpose of the equivalent matrix used with row vectors.
Geometrical Transformations
Define geometry as sets of vertices
Apply transformations to vertices to change them e.g. translation, scaling, rotation
To transform a whole shape, we transform all its vertices
Translation
Applies a 3D shift (tx, ty, tz) to all coordinates
Scaling
Applies a 3D scae (sx, sy, sz) to all coordinates, with respect to the origin
Rotation (2D)
To rotate point P about the origin by angle ϴ (anticlockwise).
x = RcosOϴ, y = RsinOϴ
x' = Rcos(ϴ + Oϴ), y' = Rsin(ϴ + Oϴ) which becomes x' = xcosϴ - ysinϴ, y' = xsinϴ + ycosϴ
To rotate about a point, we translate to the origin, rotate and then translate back
Rotation (3D)
This is the same as 2D
x' = xcosϴ - ysinϴ, y' = xsinϴ + ycosϴ. z' = z
3D rotations are relative to an axis
To rotate about a vector, we consider the 2D case of rotating about a point
Representing transformations using matrices
Scaling
Rotation
Translation
To incorporate translation, we have to add an extra row and column to the matrux and an extra term to our coordinates.
We will later use this extra row for doing projections
Homogeneous coordinates
(x, y, z, w) form is called homogeneous coordinates
This form allows us to use a consistent matrix representation for all kinds of linear transformations
3D transformations using matrices
Scale
Translation
Rotation (around the x axis)
Rotation (around the y axis)
Rotation (around the z axis)
Composing transformations
We can apply multiple transformations to a matrix at once if we multiple the matrices together to abtain the composite transformations
Matrix multiplications are generally non-commutative, so order matters
The transfomations work like function: If we do M1, then M2, then M3, they are written as M3.M2.M1
Two matrices are "inverses" if their product is the identity matrix. Not all matrices have an inverse e.g. a mtric that makes all y coordinates 0
Transformations in OpenGL
Modelview Matrix - used for transforming the geometry you draw and specifying the camera
Projection Matrix - used for controlling the way the camera image is projected onto the screen.
This is what is seen in the vertex shader
Vector Geometry
Vector additon - add two vectors of the same order, add the components, moving a point through a space in a known direction
Vector subtraction - represents a line between two points
Scalar multiplication - moves a point along a vector by a given amount
Vector magnitude - the distance between two points in 3D space, equals the square root of x, y and z squared and summed
Vector normalisation - the process of taking an abitrary, non-zero vector and converting it to a vector of length 1. Calculate the length of V and divide its x, y and z components by this value
Vector multiplication
Dot product - results in a scalar value (inner product), (x1 x x2) + (y1 x y2) + (z1 x z2)
Cross product - results in a vector (outer product)
For normalised vectors, their dot product is the cosine of the angle between them, essential for rendering
For two vectos, their cross product is a third vector, perpendicular to them both
It is essential for defining and manipulating geometry and specifying and evaluating rendering
Shaders
Vertex shaders
All shapes are made of vertices and the shader recieves each one, does some processing, sets a value for its position and passes it on
Fragment shaders
The shader sets a value for gl_FragColor, which is the final colour of teh fragment
If shaders are omitted, they are automatically created when you create a material
In between the vertex and fragment shaders, we have the Rasteriser which interpolates across triangle and generates fragments (pixels)
Uniforms
Sent to vertex and fragment shaders and stay consistent throughout the frame
Usually single values e.g. light positions, material colours, shininess
How do shaders get data?
Attributes - values applied to individual vertices. Only available to the vertex shader
Varyings - variables declared in the vertex shader that are shared with the fragment shader. Must be declared in both shaders
Polygons and Pixels
Polygons
The building block of 3D graphics - usually triangles - that are used to create meshes for 3D objects
A polygon is made up of an ordered set of vertices (V1...Vn), and a set of edges between each pair of vertices (E1...En). The polygon is then the space bounded by the vertices
OpenGL needs convex polygons with all interior angles < 180 degrees
GLU provides functions to tessellate polygons, making concave polygons into a number of convex polygons that can be rendered correctly
The surface normal
This is the vector perpendicular to the plane of the polygon. It is used to give the polygon a distinguishable 'front' and 'back' and describe its orientation in 3D space. Orientation is used in lighting calculations, collisions, culling etc
You can find the surface normal by
Choosing a pair of sequential edges and computing their vectors
Invert the direction of the first so that they now emanate from their shared vertex
Calculate, their cross product to find the surface normal
Almost always normalise the result (make its length 1)
Representing scenes
We can do this by having a huge list of individual polygons, colour them individually and draw them in order
This is called polygon soup
This is a waste of storage space as most models contain surfaces so polygons share vertices.
There is also a loss of semantics as we do not know what a polygon belongs to
This makes interaction with the model more difficult
We can instead use polygon meshes
We can use linked groups of polygons, or meshes, to represent surfaces
This retains semantics of surfaces and reduces storage by sharing vertices and edges
This helps with structuring the models so we can manipuate it mroe easily
Meshes
Triangle strips
When we add one new vertex, we get one new triangle
This can help us create a collection of linked triangles
This is very widely used and efficient
N linked triangles can be defined using N+2 vertices compared with 3N vertices if each triangle were defined separately
Triangle fan
This is also a collection of linked triangles
N linked triangles can be defined using N+2 vertices
Quadrilateral strips
Collection of linked quadrilaterals (aka quads)
These are tessellated into triangles during rendering
This is used in terrain modelling, and for approximating curved surfaces.
N x M quads can be defined using (N + 1) * (M + 1) vertices, compared with 4MN separate vertices
Scan Conversion
To get a model to a display, we take a view using a "camera" to create a 2D screen image
Scan converting a line
We sample the true geometry of the line, and approximate it using the nearest pixels available
Bresenham's algorithm
y = mx + c
As we move horizontally, x changes by 1 pixel so yn+1 = yn + m and we rounf yn+1 to the nearest pixel
Need to swap x and y according to gradient of the line
Scan converting a polygon
The polygon has been transformed by the viewing pipeline, so we know its (x, y, z) vertex coordinates in screen space
The (x, y) coordinates corresponds to a pixel position
The z coordinate is a measure of the vertex's distance from the eye (or "camera")
Scan converting a triangle
We scan convert each of the edges, and then process each row of pixels and fill in the remaining interior pixels
In practice, this naive approach is never used, There are far more efficient methods, which can be implemented in hardware
To do it more efficiently, we can use the "sweep-line" algorithm
Steps down a pair of edges, then goes down scanline by scanline, finding the start and end inside the triangle and filling in those pixels
Efficient because we only need to compute the slopes once, but it is a FP algorithm so we have to keep rounding to the pixel grid
Hidden surface removal
Viewing the world from a particular viewpoint, we cannot see some parts of the world because other objects block them
We can solve this in two ways
In 3D world space, we work it out geometrically in 3D and then draw the result (difficult)
In 2D display space, during scan converson, whenever we generate a pixl, we determine whether some other object nearer to the eye, also maps to the same space (standard approach now)
The Z-buffer (aka depth buffer)
For every pixel in the display memory, there is a corresponding entry in the Z-buffer, which is the record of the z-value of the pixel
Z-buffer algorithm
Initialise each pixel to desired background colour
Initialise each Z-buffer entry to MAXDEPTH
For each pixel P generated durnig scan-conversion of an object
If z-coordinate of P < Z-buffer[P] then compute and store colour of P then update z-coordinate of P in Z-buffer[P] else do not change anything
Z-fighting
Lack of precision in the Z-buffer leads to incorrect rendering of pixels with similar z-values; we use glPolygonOffset() to solve this
Structured models and polygons
We can represent complex things using a hierarchical structure
Object > Surface > Polygons > Edges > Vertices
General polygon mesh
Flexible way to define linked polygons
Mesh data stucture
vertex list
edge list (indexing the vertex list)
face list (inexing into the edge list)
File formats
Meshes are often big, so there are many different file formats such as wavefront "obj" file
Viewing
Viewing
Viewing 3D in 2D
We usually only have 2D displays, so a 3D object has to be projected from 3D to 2D
The camera analogy
Arrange the models into the desired composition i.e. set modelling transfomation
Position the camera and point it at the scene i.e. set viewing transformation
E is the optical centre of the camera
U, F and S define the axis of the camera (image axes)
C defines where the camera is pointing, perpendicular to U and S and antiparallel to F
Choose a camera lens, or adjust the zoom i.e. set projection transfomation
Decide the size and shape of the final photograph i.e. set viewport transformation
The 3D Viewing Pipeline
3D vertex
Modelling transformation M
Viewing transfomation V
Projection transformation P
Clip to view volume
Perspective division
Viewport transformation
2D pixel px, py
The default view
An (x, y, z) point drawn by the user is projected onto the z = 0 plane. This is an orthogonal projection, with projectors parallel to the z-axis
The z = 0 plane then get mapped to the display screen, and whatever geometry is there gets scan converted (aka rasterised) and the z-buffer applied
The duality of modelling and viewing
We can obtain the same view from a camera at a certain location and orientation, by instead transforming the object
In CG we have no camera, only transformations, but we can imagine a camera while we do modelling transformations
Specifying the camera
The default camera is at (0,0,0), looking down at the z-axis
We move the default camera to desired point E, with desired orientation V, an pointing at centre of interest C.
This transformation is Tc and we compute the inverse and apply this to our models
We use a coordinate system for the camera, using E, C and V to derive this and call it SUF
F = E - C, normalised to length 1
U is derieved using the vector V, view-up vector, assumed to be orthoganol to F
Q is orthogonal to both V and F, so we take the cross product and normalise it
However, if the user has not made sure V is orthoganol to F, then derivtion will not work
We solve this by decoupling V and F and making no assumptions about their relationship
Calculate F and before, then use F and V to create S (cross product), then use S and F to create U (cross product)
Deriving the viewing transformation
This is Tc which maps the XYZ axes to SUF, as Tc translates XYZ to E and rotates, hence Tc-1 would be the reverse
Can be derived by
Translate the origin of teh SUF camera system to (0, 0, 0)
Rotate the camera axes to be coincident with the world axes, with F aligned with -Z
Summary of Viewing
The duality of modelling and viewing says that we can get the same view by transforming the camera by T, or the object by T-1
The default camera is at (0, 0, 0) looking down the z-axis
If the transformation that woves the default camera to the desired viewpoint is Tc then we transform an object by Tc-1
Using Tc-1 in Three.js
camera.lookAt(x, y, z) computes the transformations for the camera to look at this point
Viewing in 2D
We need to specify what we want to see and where
We use the analogy of photographing the scene with a camera to specify the mapping from our scene to the display screen
We specify a window in worl coordinates and a viewport in screen coordinates
We find he matrix Mview which transforms the window to the viewport (a viewing transformation)
Mview can be found by translating by (-x0, -y0) to place the window at the origin, scaling the window to be the same shape as the viewport and shifting to the viewport position
Clipping
Normally we want to CLIP against the viewport, to remove those parts of primitives whose coordinates are outside the window
We often use multiple windows and viewports to help arrange items in the screen
You would have to define a viewing transformation for each pair of window and viewport if you were to have different scalings
Projections
Planar geometric projections
We map from 3D world coordinates to 2D coordinates through a projections
We have parallel and perspective projection
Perspecitve projections are used for realism, while parallel projections are used in CAD and engineering drawings to allow precise measurements to be made.
Parallel projection
The projection is the set of points at which the projectors intersect the projection plane
Parallel edges remain parallel in the projections, but angles may be distorte
Orthographic
Projectors are perpendicular to the projection plane, and the projection plane is parallel to a plane of the world so that there is no distortion of lengths or angles
The matrix is
and it has no inverse (is singular)
Axonometric
Projectors are perpendicular to the projection plane, which has any orientation so that we can see the 3 axes at once
This can cause distortion of lengths, bu measurements can still be made
Oblique
Projectors can make any angle with the projection plane and the projection plane can have any orientation relative to the object being viewed
Perspective projection
Perspective machines were used to help artists draw with correct perspective
Perspective projection models the way we see (lens and retina)
The projection is the set of points at which the projectors intersect the projection plane (they converge)
Objects further away from the center of projection become smaller. Edges that were parallel may converge and angles may be distorted
Which (XY, XZ, YZ) planes of the world are in parallel to the projection plane determines how many vanishing points are seen in the projected image
The matrix for this transfomation is
resulting in a point we need to normalise so tht the fourth dimension is 1 so
Viewing volumes
We now describe the field of view of the camera by defining a 3D view volume, which objects are clipped against
Defining the view volume
We define a 3D view volume which is attatched to the camera
For parallel projection, to view a 3D shape we need six planes (cuboid). The cuboid is defined by a near plane (projection plane) and a far plane, top/bottom, left/right planes. The near and far planes are orthogonal to the camera's F axis
For perspective projection, the view volume is a frutsum (truncated pyramid). It is defined by a near plane (projection plane) and a far plane, which are orthogonal to the camera;s F axis
Perspective problem
We lose z depth informarion when dividing through w, so we've made hidden-surface removal more difficult for ourselves
We solve this wir a perspective transformation which preserves depth information. We can derive a transfomation that distorts the frustum into a cube and then we can take an orthographic projection
This is called projection normlisation and OpenGL creates PN for us automatically
The clipping operation
Clipping takes place in the cube produced by projection nomalisation, getting rid of parts of the model that are not seen in the viewport
Perspective division
The clipping operation returns a set of (x, y, z, w) vertices defining polygons which are inside the view volume
OpenGL performs the persoective division by w to convert these values to (x, y, z) 3D points
Summary
The modelling transformation arranges objects in our 3D world
The viewing transformation transforms the world to give the same view as if it were being photographed by a camera
The projection transformation performs a parallel/perspective projection within limits (the clip planes)
Those parts of the 3D world outside the clip planes are discarded
5.If it's a perspective view, the perspective division "flattens" the image
The viewport transformation mapps the final image to a position in part of the display screen window
Rendering
Local and global illumination
We can model light-matter interaction in two ways
locally: we treat each object ina scene separately from any other object
globally: we treat all objects together, and model the interactions between objects
Approximation
The interaction of light and matter is modelled
The standard local model is a simple approximation
Adding per-material algorithms (shaders) gives better results and the global model is better than the local
Local illumination: elements
We start with light intensity only, then ambient illumination, diffuse reflection, positional light source, specular reflection and coloured lights and surfaces
Diffuse and specular reflection
diffuse reflection is absorption and uniform re-radiation
specular reflection is reflection at the air/surface interface
Reflectivity
Diffuse reflection - incident rays are reflected in al directions from the surface. A perfect diffuse surface reflects an incoming ray across all angles, making the suface look dull
Perfect specular reflection - reflects an incoming rat like a perfect mirror
Imperfect specular reflection - reflects an incoming ray across a smal range of angles. The surface looks shiny with highlights.
Developing a local model
Starting with diffuse reflection then
ambient illumination
In an environment containing a light source, multiple refelctions will give a general level of illumination in the scene.
If monochrome intensity of ambient light is Ia, the amount of ambient light diffusely reflected froma surface is I = kaIa, where ka is the ambient reflection coefficient
The object is unifomly illuminated, so we lose all 3D information, so we need to model the effects of different angles of incidence and different distances from the light source
Effcetive intensity Ie reveived is Ie = Ipcosϴ
Diffuse reflectivity is described by assigning it a value kd, which is the diffuse reflection coefficient, so the amount of diffusely deflected light is Ipkdcosϴ or Ipkd(N.L)
I = ambient + diffuse = kaIa + ipkd(N.L)
point illuination at source infinity
point illumination source in the scene
Introduction to image processing
Image processing
the manipulation or modification of a digitize imaged, especially to enhance its quality
e.g. greyscale, invert, brightness, contrast, threshold, edges, blur, red/green/blue channel, histogram
Image representation
Image resolution - 12 megapixel (4272x2848)
Colour depth - 24 bit (16,777,216 colours)
RGB formal, one byte per colour, 3 bytes per pixel
Greycal format - intensity, uses one byte per pixel
Image origin is the top left corner (0,0) and they are often represented by a matrix, rows x columns
Point processing
For every pixel (x, y) in the image we apply a function - I'(x, y) = F(I(x, y))
We create the image negative with F = 255 - I(x, y)
We increase the brightness with F = I(x, y) + 50
OpenCV
Originally developed by Intel, now an open source image processing/computer vision library with bindings for C/C++, Python, Android and iOS
Image representation and Colour
The Human Eye
Light sensitive cells in the eye
rods for low-light visision
cones for colour
S sensitive to short-wavelength light (blue, 65%)
M to medium (green, 33%)
L to long (red, 2%)
Lights and pigments
The standard for colours is the CIE (International Commission on Illumination)
The standard colour frequencies are blue - 435.8, green - 546.1, red - 700.0 but these are old
There are also other colour models
YCrCb used in broadcasting that deals with intensity and colour difference
Perceptual spaces - HSV, IHS, HSB have equal distances to correspond to equal changes in percieved colour
There is also the RGB cube
History, Applications and Resolution
History
1920 - Bartlane picture transmission servive using sumarine telegraph cables
1964 - Computer enhancement of images from the NASA Ranger 7 moon probe
1979 - Computer Assisted Tomography
Applications
Anything that can generate a spatially coherent measurement of some property can be imaged
Can use many energy sources - electromagnetic rays, sound and magnetic fields, UV, Lidar, Ultrasound and Sonar
Application Areas
Medicine
Oil Exploration
Astronomy
Weather
Agriculture
Policing etc.
Spatial Resolution
Resolution = field of view/number of pixels for angular resolution
Ground resolution = distance on ground/number of pixels
Nyquist's Theorem
A periodic signal can be reconstructed if the sampling interval is at least half the period
An object can be detected if two samples span its smallest dimension
More samples are needed if we want to recognise the object
Histograms, Point-processing and Geometrical Transformations
Image Histogram
We create a histogram to represent the values of pixels in an image. You can also remove some levels without negatively affecting an image
Contrast adjustment
We have an F = I(x, y) x k to make an image more contrasted
We use clipping to ensure that numbers do not go over the top, making sure they stop at 255, otherwise we have black spots in the image
Input-Output mapping
We can solarise an image, getting rid of bright spots
Min-Max linear stretch is used to make dark spots darker and light spots lighter
Thresholding is used to separate objects from their backgrounds
The T threshold can be chosen after observation of the histogram
You cans lo do automatic thresholding using a percentile calculation as we know that an object should occupy a certain percentage of the pixels in an image
Calculate the number of pixels that should be object
Create image histogram
Accumulate frequencies until total exceeds number of expected object pixel
Return current grey level as T
We can misclassify the background as the object if there is overlap
Geometrical Transfomations
These include:
Scaling
Translation
Reflection/Flipping
Shear
Rotation
Camera Lens Distortion
Radial Distortions - r is the radius from the image centre, assuming here that the origin if the image centre
Tangential distortion
Distortions can be corrected with the OpenCV function undistort()
Camera's can also be calibrated using the chessboard pattern
Interpolation
When scaling an image up, we fill in the missing pixels using nearest neighbour interpolation, which has a blocky result
Other interpolation methods include: 1D nearest-neighbour, linear, cubic, 2D nearest-neighbour, bilinear, bicubic
Forward/Reverse Mapping
For a rotation, source pixels might not land in a destination pixel as the coordinates are non-linear and if we round the coordinates, we may create holes in the destination image, hence we cycle through destination pixels and calculate their values from source pixels
Convolution, Noise and Filters
Region Processing
These are functions involving more than one source pixel
They can show edge gradients by finding the difference between pixel values
We can combine verticala nd horizontal edges to find them all by using the OR operator and increasing the threshold values of edge detection will remove weaker edges
Convolution
Convolution uses a filter kernel, which is applies to the image at every pixel. Each weight is multiplied by its underlying pixel value, the results added up and placed into the image pixel under the centre of the kernel
Correlation is similar, but does not flip the kernel diagonally like convolution. Most IP software will do correlation but call it convolution
Normalisation
When a mask creates a result that is too big to be stored, we can either work with an integer image and normalise the output or normalise the mask, dividing each weight by the size of the kernel
Composite Filters
Convolution is associative, instead of convolving twice we can use a composite
Seperable Kernels
Gradients, Edge-detection and Template Matching
Blobs
File formats, Exposue and Compression
Hough Lines
Binary Morphology