Please enable JavaScript.
Coggle requires JavaScript to display documents.
LLM 微調流程 - Coggle Diagram
LLM 微調流程
增強式學習
(Reinforcement Learning, RL)
Intruction-簡介
從人類反饋中獲得分數
模型扮演玩家 (Agent)
輸出變得更符合人類的喜好
近端策略優化
(Proximal Policy Optimization, PPO)
Direct Preference Optimization, DPO
獎勵模型
(Reward Model, RM)
GPT-Score
Single Aspect
Multi Aspect
TRL Framework
SFTTrainer
RewardTrainer
PPOTrainer
DPOTrainer
監督式微調
(Supervised Fine-Tuning)
預訓練 (Pretraining)
指令微調 (Instruction Fine-Tuning)