Please enable JavaScript.

Coggle requires JavaScript to display documents.

GPU Programming - Coggle Diagram

- - - - thinner, lower-level abstraction over the GPU hardware
        
        gives more direct control
        
        reduces software overhead
      - no state machine (as in Open GL)
      - allows multi-threading and parallel operation to utilize multi-core CPU's
      - Explicit resource and memory management
      - draw call batching, synchronization handled manually (enabling optimizations)
      - pipeline caches allow reuse of complied shaders across runs (avoiding compile costs)
      - fine gained synchronization using fences allows only necessary synch points
      - command buffers allows better pipelining and concurrency
      - reduced driver involvement (Vulkan gives apps direct GPU) avoids driver overhead
- - - - jit decorator tells Numba to compile a function
      - nopython=True disables the Python interpreter to achieve C/C++ speeds
      - njit ensures no Python interpreter
      - A few more ways to use...
        
        vectorize allows regular Python functions to act like fast NumPy ufuncs on arrays
        
        jitclass optimizes classes by compiling methods and allocating data on the heap for direct access
        
        cfunc generates a C callback function signature and implementation
        
        stencil makes it easy to specify stencil operations that update array elements based on neighbors
        
        integrates well with NumPy
      - passing target = "cuda" to vectorize or jit runs function on the GPU with no need to manage the thread hierarchy
  - - - cuda.grid() gets the thread id
      - cuda.jit decorator compiles the kernel
      - cuda.gridsize() gets grid dimensions
      - Libraries: | cuDNN, cuBLAS provide GPU-accelerated routines for many common operations
  - - - CUDA
        
        provides everything you need to develop GPU-accelerated applications
- - - - put through Claude
    - - colorize
    - - Memcheck
    - - python not in executable format
    - - Visual Studio
      - Eclipse
      - CUDA tools?
      - Mac visualization tools?
      - Rasberry pi?
    - - make decorator njit(debug=True) and use print statements
      - use pdb to set breakpoints or use %pdb on to debug on execution error