Please enable JavaScript.
Coggle requires JavaScript to display documents.
Useful LLMs - Coggle Diagram
Useful LLMs
LM training
- How to train faster with same results?
- How to get better generalization?
-
-
-
LM architectures
- RWKV
- State-space models
- RETRO / LongMem
- R2D2
- Multi-scale Transformer Language Models
-
Dynamically changing architectures
- FLM-101B: An Open LLM and How to Train It with $100K Budget
- Accelerating Training of Transformer-Based
Language Models with Progressive Layer Dropping
Automating architecture search
- Neural Architecture Search with Reinforcement Learning
- Neural Architecture Search
with Bayesian Optimisation and Optimal Transport
- and lots of others
High-quality data
- Textbooks Are All You Need
- TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4
- DoReMi
- AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
-
-
-
Optimization techniqiues
- Automatic Gradient Descent: Deep Learning without Hyperparameters
- VeLO: Training Versatile Learned Optimizers by Scaling Up
- Fine-Tuning Language Models with Just Forward Passes
Agency of LMs
- Given a smart LM, how to make it useful?
- And not make it dangerous at the same time?
Typical properties of agents
Theoretically optimal agent
- AIXI
- Knowledge-seeking agents
Theoretically optimal induction
- Solomonoff induction
- MDL principle
Properties of our universe
- Simulation hypothesis
- Tegmark multiverse
What is information?
- Probability & information theory
Processing information
- Aristotelian logic
- Bayesian inference
-
-
Resource-bounded agents
- AIXI-tl, Monte-Carlo AIXI approximation
- and all other RL methods...
Resource-bounded incremental induction
- Optimal Order Problem Solver, Gigamachine
- Transfer learning for deep models
-
Theoretically optimal knowledge transfer
- Optimal Order Problem Solver
- Goedel machine
-
Resource-bounded induction
- Speed prior
- Levin search
- Hutter search
- and all other ML methods
-
-
What goals are meaningful?
- Knowledge-seeking agents
- Free-energy principle
- Artificial curiosity
- Inverse RL, RLHF
-
LM sampling
- How to trade-off training time for inference time?
- How to get more consistent / creative output from the same model?
Calibration
- Improving Few-Shot Performance of Language Models
-
Control
- GeDi
- Plug-and-play language models
- RankGen
- Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation
- TODO: RankGen, but based on internal representations of the model
-
Better usages for LMs
- Cyborgism
- How Can We Know What Language Models Know?
- Discovering Latent Knowledge in Language Models Without Supervision