Please enable JavaScript.
Coggle requires JavaScript to display documents.
Computer Systems and Architecture 2 (Caches (Associativity (Direct-Mapped …
Computer Systems and Architecture 2
CPU Control
Single-Cycle - Clock cycle runs at length of slowest instruction. So every instruction takes a single clock cycle to perform. Non-pipelined and low performance
Multi-Cycle - Some instructions take more than one cycle to complete, so the control must have a state. Basically halts the next instruction until the previous one is completed. This is done through a series of micro instructions
ALU Control - Control sends 00 if lw or sw, 01 if beq and 10 if R-Type instruction. 10 means ALU Control looks at 6 bit function value to set up ALU. ALU control then sends a new signal to set up ALU. 000 = AND, 001 = OR etc.
I/O and Peripherals
Buses - Communication channel between the processor and I/O
Memory Mapped I/O - Peripherals have a block of main memory dedicated to them. They can be accessed as normal. From 0xFFFF0000 to 0xFFFFFFFF in MIPS
Port Mapped I/O - Each peripheral has their own address space instead of using main memory. Used in x86. To use these addresses, CPU must send an in or out signal along the I/O bus to fetch or send data
Polling - Checking the status address of I/O devices to see if they require CPU time. For example, after a keystroke, the value is stored until the CPU polls the device's status register. If there is a keystroke waiting, the key is read and the status reset. If not, then the CPU continues cycling
Interrupts - Each device has an Interrupt Request Channel (IRQ), which sends a signal to the processor telling it that they require CPU time. This interrupts the flow of processes to handle the new request, saving the state to the stack, and then continues it after the I/O instruction is finished. This requires a handler to manage interrupt requests, which is usually in device driver software. More important requests interrupt other requests
Direct Memory Access - Mini processor that transfers data between memory locations. While in use, CPU can return to the interrupted program and continue. DMA uses the system bus when the CPU isn't using it, sometimes interrupts the flow to use it
Caches
Temporal Locality - If you have recently referenced an item, it is likely you will reference it again. Caches store data that is likely to be used again so it is quicker to access. CPU will check cache first to see the data is there, if not will fetch it from main memory into cache and read it there
Spacial Locality - If you referenced an item, it is likely the next item will be referenced soon. True for instructions and data
Store copies of data/instructions for quick access, based on the two principles
Cache design - Each piece of data has an index in the cache, a tag (address in main memory) and the actual data. When data request comes in, address is checked against all tags and if the same, then hit and return data. Else miss and check main memory
Associativity
Direct-Mapped - Each address is hash mapped (or similar) into the cache. Downside is that values would be continually swapped, as addresses with similar spacial locality would be mapped to the same place (Thrashing)
n-Way Associative - Each address can be mapped to multiple places in the cache. Adds to complexity
Fully Associative - Addresses can be stored anywhere in the cache
Refill Strategy
Store a Least Recently Used value for each set of data, where the LRU item is replaced. Can cause thrashing with loops
Not needed for Direct-Mapping
Random Replacement - Randomly picks a value to replace. Cheap and slightly efficient
Write Policy
Write Through - All writes are appended to both cache and main memory. More memory accesses
Write Back - Write made to memory only if/when the item gets replaced in cache. Nice for values that are getting updated frequently
Performance stuff
Virtual Memory - Has an address space that maps all hardware address spaces into one place so it seems like one big whole unit. Divided into pages of 16-64KB. When memory access is made, checks main memory. If not there, finds it on disk and moves to main memory
Pipelining
5 Stage
Instruction Fetch - Fetch next instruction from IR
Instruction Decode - Decode the instruction in Control Unit
Execute - Execute instruction/calculations
Memory - Access memory if need be
Write Back - Write back into registers
Pipeline Registers - Added to the architecture and hold intermediate values for an instruction, so the next stage can use it
Each part is completed in a single clock cycle, and overlap so each stage is being completed for a separate instruction
If a value from a previous instruction needs to be used in the next instruction, then the pipeline must be delayed so that it can compute the first value. This is called adding a bubble
Other alternative is making the result of the ALU immediately available to the next instruction by adding a physical short-cut. This is forwarding or bypassing
Branch Prediction - If we can guess which branch will be taken, then we can prepare the next instructions as normal. If the CPU guesses wrong, then the pipeline must be flushed of all values before the correct instruction can be executed. Guessing no jump is simple, guessing jump requires the new address to be calculated, usually by a separate add component.
Static BP - Having static predictions such as loops are going to not jump, if/else statements will.
Dynamic BP - Uses more resources but can be more efficient. CPU keeps history of each branch. Or each instruction has a flag bit which determines whether it was taken last time and used to calculate whether it will or not
Delayed Branching - Branch is done early, so that it is known earlier and the code can be executed on time. Only works if it doesn't depend on any later values
Superscalar Processors
Speculative Execution - CPU continues to execute code while a hazard is being resolved
Out-of-Order Execution - Allows instructions to be re-ordered dynamically if a pipeline stalls
Multiple Pipelines so they can execute multiple instructions in parallel.