Please enable JavaScript.
Coggle requires JavaScript to display documents.
LLM Systems (CS7670): Week 01 (homepage) - Coggle Diagram
LLM Systems (CS7670): Week 01 (homepage)
1. Intro to class
What is a seminar?
Quick intro: name, program (PhD/Master), focus on ML/systems
-
-
-
-
4. Transformer-based LLM (paper)
-
-
Zikai Wang
The self-attention mechanism scales quadratically with input length, yet the paper doesn't explore longer-than-standard sequence lengths. How did performance and training efficiency hold up?
Muhammad Salman
Given that the model relies heavily on parameter-intensive linear transformation matrices, why is the claim framed as “attention is all you need” rather than acknowledging the contribution of these components?
Arunit Baidya
How can we determine which heads are meaningfully contributing to model accuracy in multi-head attention? Which improve performance and which are redundant or coincidentally present when building model?
-
-
-