Please enable JavaScript.
Coggle requires JavaScript to display documents.
Parallel Processing - Coggle Diagram
Parallel Processing
Multi-threading
Implicit and Explicit Multithreading
Implicit multithreading
Implicit threads defined statically by compiler or
dynamically by hardware
concurrent execution of multiple threads extracted from
single sequential program
All commercial processors and most experimental ones use
explicit multithreading
Approaches to Explicit Multithreading
Scalar variants
Interleaved
A.k.a. Fine-grained
—Processor deals with two or more thread contexts at a time
—Switching thread at each clock cycle
—If thread is blocked it is skipped
Blocked
A.k.a. Coarse-grained
—Thread executed until event causes delay, e.g. cache miss
—Effective on in-order processor
—Avoids pipeline stall
Superscalar variants
Simultaneous (SMT)
Instructions simultaneously issued from multiple
threads to execution units of superscalar processor
Chip multiprocessing
—Multiple processors replicated on a single chip (multicore)
—Each processor handles separate threads
Scalar Processor Approaches
Interleaved multithreaded scalar
Blocked multithreaded scalar
Single-threaded scalar
Multiple Instruction Issue Processors
Blocked multithreaded superscalar
Very long instruction word (VLIW)
Interleaved multithreading superscalar
Interleaved multithreading VLIW
Superscalar
Blocked multithreaded VLIW
Taxonomy of parallel processor
architectures
Multiple Processor Organization
Single instruction, multiple data stream - SIMD
Single machine instruction
Controls simultaneous execution
Number of processing elements
Lockstep basis
Each processing element has associated data memory
Each instruction executed on different set of data by different processors
Vector and array processors
Multiple instruction, single data stream - MISD
Sequence of data
Transmitted to set of processors
Each processor executes different instruction sequence
Never been implemented
Single instruction, single data stream - SISD
Single instruction stream
Data stored in single memory
Single processor
Uni-processor
Multiple instruction, multiple data stream - MIMD
Set of general-purpose processors
Simultaneously execute different instruction sequences
Each can process all instructions necessary
Different sets of data
Further classified by method of processor communication
Tightly Coupled - SMP
Communicate via that shared memory
Symmetric Multiprocessor (SMP)
Share single memory or pool
Shared bus to access memory
Memory access time to given area of memory is approximately the same for each processor
All processors share access to I/O
All processors can perform the same functions
System controlled by integrated operating system
Time Share Bus
Advantages
Flexibility
Easy to upgrade by adding more processor
Reliability
The bus is passive medium, failure of any attached device should not cause failure of the whole system
Simplicity
Structure similar to single processor system
Disadvantage
Performance limited by bus cycle time
Each processor should have local cache
Leads to problems with cache coherence
Processors share memory
Tightly Coupled - NUMA
Nonuniform memory access
Access times to different regions of
memory may differ
Loosely Coupled - Clusters
Interconnected to form a cluster
Communication via fixed path or network connections
Collection of independent uniprocessors or SMPs