Please enable JavaScript.
Coggle requires JavaScript to display documents.
Instruction-Level Parallelism and Superscalar Processors (Design Issues,…
Instruction-Level Parallelism and Superscalar Processors
Overview
Superscalar
Term first coined in 1987
Refers to a machine that is designed to improve the performance of the execution of scalar instructions
In most applications the bulk (almost) of the operations are on scalar quantities
Represents the next step in the evolution of high-performance general-purpose processors
Essence of the approach is the ability to execute instructions independently and concurrently in different pipelines
Concept can be further exploited by allowing instructions to be executed in an order different from the program order
Comparison of Superscalar and Superpipeline Approaches: Xem slide 5, 6 bài 16
Constraints
Instruction level parallelism
Refers to the
degree
to which the instructions of a program can be executed
in parallel
A
combination
of compiler based
optimization
and
hardware techniques
can be used to maximize instruction level parallelism
Limitations
True data dependency
Input of the next instruction is the output of the previous (RAW)
Procedural dependency
Previous instruction is a branch, code of the target can cause affects on input of the next
Resource conflicts
2 instructions access the same resource (bus, registers,…)
Output dependency
2 instructions write values to the same output (Write-after-write - WAW)
Anti-dependency
Write-after-read situation (WAR)
Situations in which parallel executions can not be used
Effect of Dependencies: Xem slide 9 bài 16
Design Issues
Instruction level parallelism
Instructions
in a sequence are
independent
Execution can be overlapped
Governed by data and procedural dependency
Machine Parallelism
Ability to take advantage of instruction level parallelism
Governed by number of parallel pipelines
3 hardware techniques that can be used in a superscalar processor to enhance performance
Renaming registers
Registers allocated dynamically
Compiler techniques attempt to maximize the use of registers maximizing the number of storage conflicts if parallel execution is applied. Register renaming is a technique of duplication of resources
(more registers are added)
. Registers are allocated dynamically by the processor hardware, and they are associated with the values needed by instructions at various points in time. Thus, the same original register reference in several different instructions may refer to different actual registers.
May result in a pipeline stall (nghẽn)
Output and antidependencies occur because register contents may not reflect the correct ordering from the program
Duplication of resources
Out-of-order issue
Instruction Issue Policy
Refers to the protocol used to issue instructions
Instruction issue occurs when instruction moves from the
decode stage of the pipeline to the first execute stage
of the pipeline
Three types of orderings are important
The
order
in which instructions are
fetched
The
order
in which instructions are
executed
The
order
in which instructions
update
the contents of register and memory locations
Superscalar instruction issue policies can be grouped into the following categories
In-order
issue with
in-order
completion
In-order
issue with
out-of-order
completion
Out-of-order
issue with
out-of-order
completion
Instruction issue
Refers to the process of
initiating
instruction execution
in
the
processor’s functional units
Branch Prediction
Any high-performance pipelined machine must address the issue of dealing with branches
Intel 80486 addressed the problem by fetching both the next sequential instruction after a branch and speculatively fetching the branch target instruction
RISC machines
Delayed branch strategy was explored
Processor always executes the single instruction that immediately follows the branch
Keeps the pipeline full while the processor fetches a new instruction stream
Superscalar machines
Delayed branch strategy has less appeal (không là yêu cầu)
Have returned to pre-RISC techniques of branch prediction
Reasons: multiple instructions need to execute in the delay slot, instruction dependencies are major interest
Superscalar Execution: Xem slide 19 bài 16
Superscalar Implementation
Instruction fetch strategies that
simultaneously fetch multiple instruction
Logic for determining true dependencies involving
register values
, and
mechanisms
for communicating these
values to where they are needed
during execution
Mechanisms for initiating, or issuing
, multiple instructions in parallel
Resources
for parallel execution of multiple instructions, including
multiple pipelined
functional units and
memory hierarchies
capable of simultaneously servicing multiple memory references
Mechanisms for committing the process state in correct order