Please enable JavaScript.

Coggle requires JavaScript to display documents.

COMP27112 Introduction to Visual Computing - Coggle Diagram

- - - - The graphics system applies fixed algorithms in a fixed order. The application provides data and changes parameters via the API.
      - We go from 3D vertices, through transformations and viewing, lighting, primitive assembly: clipping, rasterisation and fragmentation operations to pixels
    - - This pipeline is a mix of fixed and programmable functions.
      - We go from 3D vertices, through the vertex shader program, primitive assemby:clipping, rasterisation, the fragmenr shader program, fragment operations: hidden-surface removal to pixels
      - The vertex and fragment shader programs must be provided by the user, written in a shadinf language. The user has access to state in the system.
  - - - Cons
        
        New algorithsm and techniques can't be added
        
        It's depreciated
      - Pros
        
        It's simple to use and fine for many purposes
        
        For the beginner it's easy to get started quickly
    - - Pros
        
        Provides maximum flexibility
        
        It's state-of-the-art, cutting edge, new all the time
      - Con
        
        For the beginner there is significant start-up cost
  - - - 3D graphics
        
        Lines have geometry (shape) and attribute (appearance) properties
        
        Polygons i.e. triangles, quadrilaterals, convex polygons
      - coordinate transformations
      - A camera for viewing
      - hidden surface removal
      - lighting and shading
      - texturing
      - pixel operations
  - - - GL Utility Library
      - Provides functions which wrap up lowe level OpenGL graphics
      - Viewing, Textures, Tessellation
    - - GL Utility Toolkit
      - Interaction (mouse an keyboard, menu system)
      - Primitives
        
        Sphere, torus, cone, cube tetra|octo|dodeca|icosa-hedron, teapot
- - - - To incorporate translation, we have to add an extra row and column to the matrux and an extra term to our coordinates.
      - We will later use this extra row for doing projections
  - - - Dot product - results in a scalar value (inner product), (x1 x x2) + (y1 x y2) + (z1 x z2)
      - Cross product - results in a vector (outer product)
      - For normalised vectors, their dot product is the cosine of the angle between them, essential for rendering
      - For two vectos, their cross product is a third vector, perpendicular to them both
- - - - Choosing a pair of sequential edges and computing their vectors
      - Invert the direction of the first so that they now emanate from their shared vertex
      - Calculate, their cross product to find the surface normal
      - Almost always normalise the result (make its length 1)
  - - - This is a waste of storage space as most models contain surfaces so polygons share vertices.
      - There is also a loss of semantics as we do not know what a polygon belongs to
      - This makes interaction with the model more difficult
    - - We can use linked groups of polygons, or meshes, to represent surfaces
      - This retains semantics of surfaces and reduces storage by sharing vertices and edges
      - This helps with structuring the models so we can manipuate it mroe easily
  - - - When we add one new vertex, we get one new triangle
      - This can help us create a collection of linked triangles
      - This is very widely used and efficient
      - N linked triangles can be defined using N+2 vertices compared with 3N vertices if each triangle were defined separately
    - - This is also a collection of linked triangles
      - N linked triangles can be defined using N+2 vertices
    - - Collection of linked quadrilaterals (aka quads)
      - These are tessellated into triangles during rendering
      - This is used in terrain modelling, and for approximating curved surfaces.
      - N x M quads can be defined using (N + 1) * (M + 1) vertices, compared with 4MN separate vertices
  - - - We sample the true geometry of the line, and approximate it using the nearest pixels available
      - Bresenham's algorithm
        
        y = mx + c
        
        As we move horizontally, x changes by 1 pixel so yn+1 = yn + m and we rounf yn+1 to the nearest pixel
        
        Need to swap x and y according to gradient of the line
    - - The polygon has been transformed by the viewing pipeline, so we know its (x, y, z) vertex coordinates in screen space
      - The (x, y) coordinates corresponds to a pixel position
      - The z coordinate is a measure of the vertex's distance from the eye (or "camera")
    - - We scan convert each of the edges, and then process each row of pixels and fill in the remaining interior pixels
      - In practice, this naive approach is never used, There are far more efficient methods, which can be implemented in hardware
      - To do it more efficiently, we can use the "sweep-line" algorithm
        
        Steps down a pair of edges, then goes down scanline by scanline, finding the start and end inside the triangle and filling in those pixels
        
        Efficient because we only need to compute the slopes once, but it is a FP algorithm so we have to keep rounding to the pixel grid
  - - - In 3D world space, we work it out geometrically in 3D and then draw the result (difficult)
      - In 2D display space, during scan converson, whenever we generate a pixl, we determine whether some other object nearer to the eye, also maps to the same space (standard approach now)
  - - - Initialise each pixel to desired background colour
      - Initialise each Z-buffer entry to MAXDEPTH
      - For each pixel P generated durnig scan-conversion of an object
        
        If z-coordinate of P < Z-buffer[P] then compute and store colour of P then update z-coordinate of P in Z-buffer[P] else do not change anything
    - - Lack of precision in the Z-buffer leads to incorrect rendering of pixels with similar z-values; we use glPolygonOffset() to solve this
  - - - vertex list
      - edge list (indexing the vertex list)
      - face list (inexing into the edge list)
    - - Meshes are often big, so there are many different file formats such as wavefront "obj" file
- - - - We usually only have 2D displays, so a 3D object has to be projected from 3D to 2D
      - The camera analogy
        
        Arrange the models into the desired composition i.e. set modelling transfomation
        
        Position the camera and point it at the scene i.e. set viewing transformation
        
        E is the optical centre of the camera
        
        U, F and S define the axis of the camera (image axes)
        
        C defines where the camera is pointing, perpendicular to U and S and antiparallel to F
        
        Choose a camera lens, or adjust the zoom i.e. set projection transfomation
        
        Decide the size and shape of the final photograph i.e. set viewport transformation
    - - 3D vertex
      - Modelling transformation M
      - Viewing transfomation V
      - Projection transformation P
      - Clip to view volume
      - Perspective division
      - Viewport transformation
      - 2D pixel px, py
    - - An (x, y, z) point drawn by the user is projected onto the z = 0 plane. This is an orthogonal projection, with projectors parallel to the z-axis
      - The z = 0 plane then get mapped to the display screen, and whatever geometry is there gets scan converted (aka rasterised) and the z-buffer applied
    - - We can obtain the same view from a camera at a certain location and orientation, by instead transforming the object
      - In CG we have no camera, only transformations, but we can imagine a camera while we do modelling transformations
    - - The default camera is at (0,0,0), looking down at the z-axis
      - We move the default camera to desired point E, with desired orientation V, an pointing at centre of interest C.
      - This transformation is Tc and we compute the inverse and apply this to our models
      - We use a coordinate system for the camera, using E, C and V to derive this and call it SUF
      - F = E - C, normalised to length 1
      - U is derieved using the vector V, view-up vector, assumed to be orthoganol to F
      - Q is orthogonal to both V and F, so we take the cross product and normalise it
      - However, if the user has not made sure V is orthoganol to F, then derivtion will not work
        
        We solve this by decoupling V and F and making no assumptions about their relationship
        
        Calculate F and before, then use F and V to create S (cross product), then use S and F to create U (cross product)
    - - This is Tc which maps the XYZ axes to SUF, as Tc translates XYZ to E and rotates, hence Tc-1 would be the reverse
      - Can be derived by
        
        Translate the origin of teh SUF camera system to (0, 0, 0)
        
        Rotate the camera axes to be coincident with the world axes, with F aligned with -Z
    - - The duality of modelling and viewing says that we can get the same view by transforming the camera by T, or the object by T-1
      - The default camera is at (0, 0, 0) looking down the z-axis
      - If the transformation that woves the default camera to the desired viewpoint is Tc then we transform an object by Tc-1
    - - camera.lookAt(x, y, z) computes the transformations for the camera to look at this point
    - - We need to specify what we want to see and where
      - We use the analogy of photographing the scene with a camera to specify the mapping from our scene to the display screen
      - We specify a window in worl coordinates and a viewport in screen coordinates
      - We find he matrix Mview which transforms the window to the viewport (a viewing transformation)
      - Mview can be found by translating by (-x0, -y0) to place the window at the origin, scaling the window to be the same shape as the viewport and shifting to the viewport position
      - Clipping
        
        Normally we want to CLIP against the viewport, to remove those parts of primitives whose coordinates are outside the window
        
        We often use multiple windows and viewports to help arrange items in the screen
        
        You would have to define a viewing transformation for each pair of window and viewport if you were to have different scalings
  - - - We map from 3D world coordinates to 2D coordinates through a projections
      - We have parallel and perspective projection
      - Perspecitve projections are used for realism, while parallel projections are used in CAD and engineering drawings to allow precise measurements to be made.
    - - The projection is the set of points at which the projectors intersect the projection plane
      - Parallel edges remain parallel in the projections, but angles may be distorte
      - Orthographic
        
        Projectors are perpendicular to the projection plane, and the projection plane is parallel to a plane of the world so that there is no distortion of lengths or angles
        
        The matrix is and it has no inverse (is singular)
      - Axonometric
        
        Projectors are perpendicular to the projection plane, which has any orientation so that we can see the 3 axes at once
        
        This can cause distortion of lengths, bu measurements can still be made
      - Oblique
        
        Projectors can make any angle with the projection plane and the projection plane can have any orientation relative to the object being viewed
    - - Perspective machines were used to help artists draw with correct perspective
      - Perspective projection models the way we see (lens and retina)
      - The projection is the set of points at which the projectors intersect the projection plane (they converge)
      - Objects further away from the center of projection become smaller. Edges that were parallel may converge and angles may be distorted
      - Which (XY, XZ, YZ) planes of the world are in parallel to the projection plane determines how many vanishing points are seen in the projected image
      - The matrix for this transfomation is resulting in a point we need to normalise so tht the fourth dimension is 1 so
    - - We now describe the field of view of the camera by defining a 3D view volume, which objects are clipped against
      - Defining the view volume
        
        We define a 3D view volume which is attatched to the camera
        
        For parallel projection, to view a 3D shape we need six planes (cuboid). The cuboid is defined by a near plane (projection plane) and a far plane, top/bottom, left/right planes. The near and far planes are orthogonal to the camera's F axis
        
        For perspective projection, the view volume is a frutsum (truncated pyramid). It is defined by a near plane (projection plane) and a far plane, which are orthogonal to the camera;s F axis
    - - We lose z depth informarion when dividing through w, so we've made hidden-surface removal more difficult for ourselves
      - We solve this wir a perspective transformation which preserves depth information. We can derive a transfomation that distorts the frustum into a cube and then we can take an orthographic projection
      - This is called projection normlisation and OpenGL creates PN for us automatically
    - - Clipping takes place in the cube produced by projection nomalisation, getting rid of parts of the model that are not seen in the viewport
    - - The clipping operation returns a set of (x, y, z, w) vertices defining polygons which are inside the view volume
      - OpenGL performs the persoective division by w to convert these values to (x, y, z) 3D points
    - - The modelling transformation arranges objects in our 3D world
      - The viewing transformation transforms the world to give the same view as if it were being photographed by a camera
      - The projection transformation performs a parallel/perspective projection within limits (the clip planes)
      - Those parts of the 3D world outside the clip planes are discarded
      - 5.If it's a perspective view, the perspective division "flattens" the image
      - The viewport transformation mapps the final image to a position in part of the display screen window
- - - - locally: we treat each object ina scene separately from any other object
      - globally: we treat all objects together, and model the interactions between objects
  - - - ambient illumination
        
        In an environment containing a light source, multiple refelctions will give a general level of illumination in the scene.
        
        If monochrome intensity of ambient light is Ia, the amount of ambient light diffusely reflected froma surface is I = kaIa, where ka is the ambient reflection coefficient
        
        The object is unifomly illuminated, so we lose all 3D information, so we need to model the effects of different angles of incidence and different distances from the light source
        
        Effcetive intensity Ie reveived is Ie = Ipcosϴ
        
        Diffuse reflectivity is described by assigning it a value kd, which is the diffuse reflection coefficient, so the amount of diffusely deflected light is Ipkdcosϴ or Ipkd(N.L)
        
        I = ambient + diffuse = kaIa + ipkd(N.L)
      - point illuination at source infinity
      - point illumination source in the scene
- - - - rods for low-light visision
      - cones for colour
        
        S sensitive to short-wavelength light (blue, 65%)
        
        M to medium (green, 33%)
        
        L to long (red, 2%)
  - - - YCrCb used in broadcasting that deals with intensity and colour difference
      - Perceptual spaces - HSV, IHS, HSB have equal distances to correspond to equal changes in percieved colour