Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 6 (Loosely Coupled Clusters (Applications with independent tasks…
Chapter 6
Loosely Coupled Clusters
Applications with independent tasks
Web servers
databases
High availability
scalable
Problems
Administration cost
Low interconnect bandwidth
Grid computing
Vector Processors
Highly pipelined function units
Stream data from/to vector registers to units
Simplify data-parallel programming
Explicit statement of absence of loop-carried
dependences
SIMD
Operate elementwise on vectors of data
Simplifies synchronization
Multithreading
Fine-grain multithreading
Switch threads after each cycle
Coarse-grain multithreading
Only switch on long stall
L2-cache miss
Simultaneous Multithreading
Future
Power considerations
Tolerating cache-miss latency
Interconnection Networks
Network topologies
Arrangements of processors
Switchs
Links
Performance
Latency per message
Throughput
Routability in silicon
Power
Parallel Programming
Amdhl's law
Sequential part can limit speedup
Scaling
Strong scaling: problem size fixed
Weak scaling: problem size proportional to
number of processors
SPMD
Single Program Multiple Data
GPU/ Graphics Processing Units
Architectures
Processing is highly data-parallel
Trend toward general purpose GPUs
CPU for sequential code, GPU for parallel code
Programming languages/APIs
Compute Unified Device Architecture (CUDA)
Classifying
Don’t fit SIMD/MIMD model
Massage passing
processor
private physical
address space
Hardware
sends/receives messages
Modeling Performance
Roofline Diagram
Arithmetic Intensity
Attainable GPLOPs/sec
Optimizing
FP performance
Balance adds & multiplies
SIMD
memory usage
Software prefetch
Memory affinity
Shared Memory
SMP
shared memory multiprocessor
Synchronize shared variables using locks
Memory access time
UMA (uniform)
NUMA (nonuniform)