RL
Multi-task Learning
Ideas
Meta-learning [C]
black-box approaches [C]
optimization-based [C]
metric learning [C]
meta-overfitting
unsupervised
goal-conditioned
Hierachical
Lifelong
AutoML
Different agents at the different scene and then switching the agents, while different agent can live across environments
Objective
various heuristics
use task uncertainty
aim for monotonic improvement towards Pareto optimal solution
optimiz for the worst-case task loss
Chen et al. GradNorm. ICML 2018
Kendall et al. CVPR 2018
Sener et al. NeurlPS 2018
Pareto optimal
From loss perspective
θ∗ is Pareto optimal if there exists no θ that dominates θ∗
\( \theta_a \) dominates \( \theta_b\) of \( \mathcal{L}_i(\theta_a) \leq \mathcal{L}_i(\theta_b) \forall i\) and if \(\sum_i\mathcal{L}_i(\theta_a)\neq\sum_i\mathcal{L}_i(\theta_b)\)
Challeges
Negative transfer
limited representation capacity
often need much larger networks
optimization challenges
tasks may learn at different rates
caused by cross-task interference
overfitting
Architecture
MultiHead
soft-parameter sharing
Multi-gate Mixture-of-Experts (MMoE)
Transfer learning
Fine-tuning
common practices [C]
Challenge:outputting all neural net parameters does not seem scalable
Only output sufficient statistics
External Memory
Mishra et al. SNAIL, 17
Neural Turing Machine
Santoro et al. MANN
Munkhdalai, Yu. ICML17.Meta Networks
Feedforward+average
Garnelo, Conditional Neural Processes, ICML 18
Finn, ICML 2017 MAML:\({\rm min}_\theta\sum_{{\rm task \ } i} \mathcal{L}\big(\theta - \alpha\bigtriangledown_\theta \mathcal{L}(\theta, \mathcal{D}_i^{\rm tr}), \mathcal{D}_i^{\rm ts} \big)\)
Ravi ICLR 17, precedes MAML:\(\phi_i=\theta - \alpha f(\theta, \mathcal{D}_i^{\rm tr}, \bigtriangledown_\theta\mathcal{L})\)
Finn & Levine ICLR 18, For a sufficiently deep network, MAML function can approximate any function of \(\mathcal{D}_i^{\rm tr}, x^{\rm ts}\)
challenges
Bi-level optimization can exhibit instabilities
Backpropagating through many inner gradient step is compute-and-memory-intensive
Automatically learn inner vector learning rate, tune outer learning rate
Optimize only a subset of the parameters in the inner loop
Decouple inner learning rate, BN statistics per-step
introduce context variables for increased expressive power
Antoniou et al. MAML++
Li et al. Meta-SGD
Behl et al. AlphaMAML
Finn et al. bias transformation
Zintgraf et al. CAVIA
Zhou et al. DEML
[CRUDELY] approximate \(\frac{d\phi_i}{d\theta}\) as identity[C]
Finn et al. first-order MAML 17
Nichol et al. Reptile 18
Only optimize the last layer of weights [C]
Bertinetto et al. R2-D2 19, ridge regression, logistic regression
Lee et al. MetaOptNet 19
Derive meta-gradient using the implicit function theorem [C]
Rajeswaran, Finn, Implicit MAML 19
How to choose architecture that is effective for inner gradient step?
Kim et al. Auto-Meta, progressive NAS+MAML [C]
Use non-parametric learner
Koch et al. ICML 15, Siamese network
Matching
Vinyals et al. Matching Networks, NIPS 16
Snell et al. Prototypical Networks, NIPS 17
challeges
More complex relationships between datapoints
Sung et al. Relation Net, learn non-linear relation module on embeddings
Allen et al. IMP, ICML 19, Learn infinite mixture of prototypes
Garcia, GNN, perform message passing on embeddings
Viraj et al. NIPS 18 workshop, Prototypical Clustering Networks for Dermatological Image Classification [C]
Triantafillou et al. Proto-MAML 19, initialize last layer as ProtoNet during meta-training
Rusu et al. LEO 19, gradient descent on relation net embedding
Yu, Finn et al. One-shot Imitation from Observing Humans, RSS2018
Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction, 2020
Few-Shot Human Motion Prediction via Meta-learning, ECCV 2018
Transformer
Brown, 2020, GPT3
Hindsight Labeling [C]
challenges
Model-based [C]
Optimize over actions using model \({\rm max}_{a_{t:t+H}} \sum_t r(s_t, a_t)\)
backpropagation
sampling
Plan & replan using model
model-predictive control (MPC) [C]
reward function knowing
without knowing reward function
Nagabandi, Deep Dynamics Models for Learning Dexterous Manipulation, CoRL 19
Xie, Few-shot Goal Inference for Visuomotor Learning and Planning, CoRL 18
High-dimension Image Observation
Models in latent space [C]
models directly in image space [C]
model alternative quantities [C]
Watter, Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images, NIPS 2015 [C]
Finn, Deep Spatial Autoencoders for Visumotor Learning, ICRA 2016
also predict reward
Jaderberg 17
Shelhamer 17
MPC
Finn, CoRL 17
Villegas, NIPS 19
Finn, Deep Visual Foresight for Planning Robot Motion, ICRA1 7
Pinto 16
Kahn 17
Dosovitskiy 17
RL[C]
duan, RL2, 17
Wang, Learning to Reinforcement learn, CogSci 17
Rakelly, PEARL, Efficient Off-policy Meta-reinforcement Leanring via Probabilitic Context Variables, ICML 19
RL [C]
click to edit
MAML+PG
MAML+MBRL
Nagabandi, Learning to Adapt in Dynamic Environemtnts through Meta-RL, ICLR 19
Exploration
Learning to Explore
End-to-end optimization [C]
Alternative exploration strategies [C]
Decoupled exploration & exploitation
Finn, Learning to learn with gradients, Phd Thesis 2018
stadie 2018
Zintgraf 2019
Karnienny 2020
Posterior sampling (Thompson sampling)
task dynamics & reward prediction
Zhang, MetaCURE 2020
Decouple by acquiring representation of task relevant information
Liu, Explore then Execute: Adopting without Rewards via Factorized Meta-RL, 2020
Information theoretic
entropy
Mutual information
KL-divergence
\(\mathcal{H}(p(x))=-E_{x \sim p(x)}[\log p(x)]\)
\(D_{\rm KL(q||p)} = E_q[log\frac{q(x)}{p(x)}] = -E_q\log p(x)-\mathcal{H}(q(x))\)
How broad \(p(x)\) is
Distance between two distributions
\(\mathcal{I}(x;y)=D_{\rm KL}(p(x,y) || p(x)p(y))=\mathcal{H}(p(x))-\mathcal{H}(p(x|y))\)
if x and y are independent, mi will be low
\(\mathcal{I}(s_{t+1};a_t)=\mathcal{H}(s_{t+1})-\mathcal{H}(s_{t+1}|a_t)\)
skill
Eysenbach, Diversity is All you need [C]
Sharma Gu, DADS, 2019 [C]
design choices
Nachum, Why Does Hierarchy (Sometimes) Work? 2019 [C]
off-policy
Nachum, HIRO, 2018
self-terminating
Bacon, The Option-Critic Architecture, 2016
pretrain
Hess, Learning Locomotor Controllers, 2016
goal-condition
Gupta, Relay Policy Learning, 2019
T Yu, S Kumar, PCGrad, Gradient Surgery for Multi-Task Learning, NIPS 20