Parallel Processors from Client to Cloud
multiprocessor
high performance
task-level parallelism
process-level parallelism
parallel processing program
cluster
multicore microprocessor
shared memory multiprocessor(SMP)
difficulty
speed-up challenge
strong scaling, weak scaling
balancing load
SiSD, MIMD, SIMD, SPMD, and Vector
data-level parallelism
vector vs conventional code
vector vs scalar
vector vs multimedia extensions
vector lane
hardware multithreading
thread
process
fine-grained multithreading
coarse-grained multithreading
simultaneous multithreading(SMT)
multicore and other shared memory multiprocessor
shared memory multiprocessr(SMP)
uniform memory access(UMA) / nonuniform memory access(NUMA)
synchronization
lock
graphics processing units
NVIDIA GPU Architecture
thread block schedular
clusters, warehouse scale computers, and other message-passing multiprocessors
message passing
send message routine
receive message routine
warehouse-scale computers
software as a service (SaaS)
multiprocessor network topologies
network bandwidth
bisection bandwidth
fully connected network
multistage network
crossbar network
multiprocesso benchmarks and perormance models
linpack, SPECrate, SPLASH, SPLASH2, NAS, PARSEC(Pthreads)
performance models
arithmetic intensity