Please enable JavaScript.
Coggle requires JavaScript to display documents.
RLHF - Coggle Diagram
RLHF
-
-
Feedback Collection
-
What is the format of the feedback?
The choice of format has implications on the expressivity of the feedback, the ease of its collection, and how we can use it to improve systems.
Numerical scores: Although easy to leverage, numerical feeback might generally be a hard and ill-defined task for humans, leading to a costly collection process and problems of subjectivity and variance. Extensively used for evalutation.
Ranking-based: Easier to collect. Tends to be collected to improve model behavior rather than just for evaluation.
Qualitative natural language explanations: Tipically provides more detailed information, either highlightin the short-comings of the current output or suggesting specific actions for improvement.
What is its objective?
The purpose of collection feedback is to align the model's behavior with some (often ill-defined) goal behavior.
Helpfulness: A necessary (but not sufficient) condition for a helpful system is that it perfors the task well, and so feedback related to task performance generally falls under this umbrella. - Machine translation: quality of translation; - Summarization: relevance, consistency and accuracy; - Ability to follow instructions.
Harmlessness: We want our models not to produce certain types of output or violate certain norms. Ask humans to provide feedback on the harmlessness of their system, by defining a set of rules and asking humans if the outputs violate these rules.
-
How is it modeled?
-
Surrogate models that approximate human preferences: Models that can predict or approximate human preferences.
-
-
-
-