Please enable JavaScript.
Coggle requires JavaScript to display documents.
decision transformer - Coggle Diagram
decision transformer
training
temporal difference learning
discounting future rewards
deadly triad
induce undesirable short-sighted-behaviors
sequence modeling objective
bypass bootstrapping for long term credit assignment
easy to scale
perform credit assignment via self-attention
illustrative example
achieve policy improvement without DP
find shortest path
GPT architecture
modify the transformer architecture with a causal self-attention mask
summation/softmax with the previous tokens in the sequence
Trajectory representation
generate actions based on future desired returns
feed the model with returns-to-go
transformer