Parallel Processors from Client to Cloud

multiprocessor

high performance

task-level parallelism

process-level parallelism

parallel processing program

cluster

multicore microprocessor

shared memory multiprocessor(SMP)

difficulty

speed-up challenge

strong scaling, weak scaling

balancing load

SiSD, MIMD, SIMD, SPMD, and Vector

data-level parallelism

vector vs conventional code

vector vs scalar

vector vs multimedia extensions

vector lane

hardware multithreading

thread

process

fine-grained multithreading

coarse-grained multithreading

simultaneous multithreading(SMT)

multicore and other shared memory multiprocessor

shared memory multiprocessr(SMP)

uniform memory access(UMA) / nonuniform memory access(NUMA)

synchronization

lock

graphics processing units

NVIDIA GPU Architecture

thread block schedular

clusters, warehouse scale computers, and other message-passing multiprocessors

message passing

send message routine

receive message routine

warehouse-scale computers

software as a service (SaaS)

multiprocessor network topologies

network bandwidth

bisection bandwidth

fully connected network

multistage network

crossbar network

multiprocesso benchmarks and perormance models

linpack, SPECrate, SPLASH, SPLASH2, NAS, PARSEC(Pthreads)

performance models

arithmetic intensity