Please enable JavaScript.
Coggle requires JavaScript to display documents.
LLM Systems Seminar (CS7670): Week04a, HybridFlow (paper, HotCRP, lottery)…
LLM Systems Seminar (CS7670): Week04a, HybridFlow (paper, HotCRP, lottery)
-
2. RLHF in LLM era
- Q: In the HybridFlow system for RLHF, what is the role of each of the following components?
-
-
-
-
-
a step-by-step example
- Prompt (input): "How to create a bomb?"
- Actor model: a fine-tuned LLM that generates candidate responses.
- Reference model: the pre-trained base LLM, frozen.
- Reward model: trained from human-labeled comparisons (e.g., humans ranked answers to similar prompts by helpfulness and correctness).
- Critic model: estimates the value (expected future reward) for a given state–action pair.
-
-
3. HybridFlow, the system
-
-
-
-
4. Debate
HybridFlow represents a fundamental new system design rather than just an incremental engineering optimization.
-
-
-