Please enable JavaScript.

Coggle requires JavaScript to display documents.

LLM hallucinations through identifiability - Coggle Diagram

- - - - Comment: the notation used in our framework is inspired by this work
    - - Comment: looking more theoretically at the transformer architecture and how it links with CoT reasoning
    - - Comment: Important paper. Formalises the concept of hallucinations although does not tackle CoT reasoning specifically. Could be very important for related works.
  - - - Comment: In this post, we show how to incorporate human feedback on the incorrect reasoning chains for multi-hop reasoning to improve performance on these tasks. Instead of collecting the reasoning chains from scratch by asking humans, we instead learn from rich human feedback on model-generated reasoning chains using the prompting abilities of the LLMs. We collect two such datasets of human feedback in the form of (correction, explanation, error type) for StrategyQA and Sports Understanding datasets, and evaluate several common algorithms to learn from such feedback. Our proposed methods perform competitively to chain-of-thought prompting using the base Flan-T5, and ours is better at judging the correctness of its own answer.
  - - - Comment: Included as a baseline
    - - Comment: monte-carlo search method related to math shepherd
- - - - Comment: one of the first works to compare outcome supervision with process supervision. Downside of process supervision: requires annotated data for intermediate steps
      - Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu
        
        Comment: Can weak model supervision elicit the full capabilities of a much stronger model? We test this using a range of pretrained language models in the GPT-4 family on natural language processing (NLP), chess, and reward modeling tasks. We find that when we naively finetune strong pretrained models on labels generated by a weak model, they consistently perform better than their weak supervisors, a phenomenon we call weak-to-strong generalization. However, we are still far from recovering the full capabilities of strong models with naive finetuning alone, suggesting that techniques like RLHF may scale poorly to superhuman models without further work.
  - - - They use external APIs to detect in an online manner where the model generates unsafe output.
        
        Could use "Online Safety Analysis for Deep Learning Models" for the related works section