Please enable JavaScript.
Coggle requires JavaScript to display documents.
LLM Systems (CS7670): Week02a, Training LLMs (homepage, lottery) - Coggle…
LLM Systems (CS7670): Week02a, Training LLMs (homepage, lottery)
0. Recap
-
-
Top-5 technical terms
- loss function (90.9%)
- back-propagation (90.9%)
- dropout (81.8%)
- KV Cache (81.8%)
- regularization (77.3%)
-
-
Boom-5 technical terms: speculative decoding (27.3%) RoPE (18.2%) SwiGLU (18.2%) NCCL (18.2%) PD-separation (18.2%)
-
1. Core research problem
train LLM model...
...efficiently (minimize GPU count, wall-clock time, and dollar cost) and...
-
pre-training and post-training (SFT, RLHF, RLVR, alignment)
-
-
-
2. Research topics
- Training parallelization (more on this today)
- Training correctness and performance anomalies (a touch on this today)
-
-
-
-
- Alignment and safety fine-tuning
- Adaptive fine-tuning (e.g., LoRA)
- Compression and distillation
- Real-world training experiences
- Communication optimization
-
- Resource management (scheduling, allocation, fairness, etc.)
- Non-traditional training paradigms
-
-
-
-
-
- Training under constraints (old GPUs, low connectivity, etc.)
-
-
-
-