Please enable JavaScript.
Coggle requires JavaScript to display documents.
Course Plan : CS3006 – Parallel & Distributed Computing (Spring 2026)…
Course Plan
:
CS3006 – Parallel & Distributed Computing (Spring 2026)
Course DNA
Big idea: Make programs faster
correctly
on modern hardware :check:
Core skills
Redesign parallel/distributed solution by Understanding and Decomposing problems (data/task decomposition)
Choose the right model (shared vs message passing vs GPU)
Reason about performance (compute vs memory vs communication)
Debug parallel correctness (races/deadlocks)
Primary tools
C/C++ (core)
OpenMP (shared memory)
MPI (distributed memory)
CUDA/OpenCL (GPU)
Optional: Hadoop/MapReduce (fault-tolerant data-parallel)
Module 1: Foundations + Multicore Reality (Weeks 1–2)
Why parallelism is required?
Power wall/Thermal Wall
Memory Hierarchy and Memory Wall
ILP + superscalar overview + ILP Wall
Multicore era & Architecture essentials
Flynn's Taxonomy
Coherence basics (why threads fight)
UMA vs NUMA (placement matters)
Performance toolkit
CPU time, cycles, CPI intuition
Profiling & bottlenecks
Validation: speedup claims (avoid “bench lies”)
Amdahl & Gustafson (speedup boundaries)
Short OpenBLAS DEMO (Performance Analysis)
Data-parallel CPU speedups
SIMD intrinsics (what it accelerates / what it doesn’t)
SIMD and OpenBLAS Comparison
Parallel Thinking + Shared Memory Programming (Weeks 3–5)
Parallelization strategy
Problem understanding → decomposition → mapping
Domain vs functional decomposition
Load balancing and scheduling
Locality + communication
Why data layout changes runtime
Cache behavior, bandwidth ceilings
Correctness fundamentals
Data races vs deadlocks vs livelocks
Synchronization tools (locks/atomics/barriers)
Debugging mindset + symptoms
Shared Memory Programming using OpenMP
parallel, for, reductions
Scheduling (static/dynamic/guided + chunk size)
Synchronization patterns (critical/atomic/barrier)
Common pitfalls (false sharing, oversubscription)
Midterm-I (Week 6)
Covers
Foundations + architecture + performance basics
Parallelization strategy + locality + sync
OpenMP fundamentals + scheduling
Module 2: Distributed Memory + MPI (Weeks 7–8)
Distributed systems shift
Why shared memory stops scaling
Cluster basics (nodes, network, runtime)
MPI fundamentals
SPMD, ranks, send/recv
Correctness hazards (deadlocks, mismatched collectives)
Advanced MPI ideas
Collectives (broadcast/reduce/allreduce)
Nonblocking MPI + overlap comm/comp
Tradeoffs
Fault tolerance vs performance (MPI vs fault-tolerant systems)
Module 3: GPUs + Heterogeneous Acceleration (Weeks 9–11)
GPU mental model
SIMT execution
Memory hierarchy + bandwidth
Host↔device transfer costs (the hidden tax)
Programming models
OpenCL or CUDA
Kernels, grids/work-items, memory spaces
Performance levers
Coalescing, tiling (shared/local memory)
Occupancy intuition
Minimize transfers, pipeline work
Applications
HPC kernels (stencil/matmul-ish)
ML/DL acceleration overview
Hybrid CPU+GPU workflow
Midterm-II (Week 12)
Covers
MPI fundamentals + scaling + tradeoffs
GPU architecture + programming basics
Hybrid performance reasoning
Module 4: Dependency Analysis + Higher-Level Views (Weeks 13–15)
Dependency analysis
Data dependencies
Task graphs / dependency graphs
Loop-carried dependencies (why some loops resist parallelism)
Advanced Topics (Optional)
SYCL/OmpSs (Portability and Productivity)
Hadoop/MapReduce (Built-in Fault Tolerance)
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
Assessments
Assignments (min 3)
[3 Marks Each]
A1: Pthreads revision + basics
A2: OpenMP + SIMD (correctness + scheduling + speedup)
A3: MPI (decomposition + collectives/nonblocking)
(MIGHT BE INTEGRATED WITH PROJECT) A4: GPU (OpenCL/CUDA case study)
Project (case-study or research-based)
[8 Marks]
Proposal (Week 10/11)
Milestone check-in (Week 13/14)
Final demo + report (Week 15)
Exams
Midterm-I (Week 6)
[12 marks]
Midterm-II (Week 12)
[15 marks]
Final Exam
Quizzes (minimum 5)
[16 marks]
End-of-course Outcomes
Students can
Explain multicore + coherence implications
Parallelize + schedule effectively
Write correct OpenMP + MPI + basic GPU programs
Measure, validate, and explain performance results
Deliver a reproducible project with credible evaluation