Please enable JavaScript.

Coggle requires JavaScript to display documents.

RLHF - Coggle Diagram

- - - - Numerical scores: Although easy to leverage, numerical feeback might generally be a hard and ill-defined task for humans, leading to a costly collection process and problems of subjectivity and variance. Extensively used for evalutation.
      - Ranking-based: Easier to collect. Tends to be collected to improve model behavior rather than just for evaluation.
      - Qualitative natural language explanations: Tipically provides more detailed information, either highlightin the short-comings of the current output or suggesting specific actions for improvement.
    - - Helpfulness: A necessary (but not sufficient) condition for a helpful system is that it perfors the task well, and so feedback related to task performance generally falls under this umbrella. - Machine translation: quality of translation; - Summarization: relevance, consistency and accuracy; - Ability to follow instructions.
      - Harmlessness: We want our models not to produce certain types of output or violate certain norms. Ask humans to provide feedback on the harmlessness of their system, by defining a set of rules and asking humans if the outputs violate these rules.
    - - Training stage to optimize the model parameters directly
        
        Feedback-based imitation learning: Surpevised learning with a dataset composed of positively-labeled generations: maximizing the likehood of the model's answers labeled as correct by humans.
        
        Joint-feedback modeling: Leverages all the information collected by directly using human feedback to optimize the model. Some works simply train the model to predict the feedback given to each generation. Other works train the model to predict the generations and the corresponding human feedback.
        
        Reinforcement learning: More versatile approach , allowing for direct optimization of a model's parameters based on human feedback.
      - Used at inference time to guide the decoding process
        
        Feedback memory: Maintaining a repository of feedback from prior sessions. Then, when processing new inputs, the system use relevant feedback from similar inputs in its memory to guide the model toward generation more desirable outputs based on past experiences.
        
        Iterative output refinement: Users can provide feedback on intermediatre responses, enabling the model to adjust its outputs until it meets the user's satisfaction.
        
        Feedback models: Sampling a large number of candidate generations, and reraking them according to thje feedback model.
    - - Direct feedback from humans
      - Surrogate models that approximate human preferences: Models that can predict or approximate human preferences.