Decision Transformer: Unifying sequence modelling and model-free, offline RL Tue, 01 Jun 2021 By
In this article we will explain and discuss the paper:
that explores application of transformers to model sequential decision making problems - formalized as Reinforcement Learning (RL). By training a language model on a training dataset of random walk trajectories, it can figure out optimal trajectories by just conditioning on a large reward.
Figure 1. Conditioned on a starting state and generating largest possible return at each node, Decision Transformer sequences optimal paths. (Source)
The idea is simple. 1) Each modality (return, state, or action) is passed into an embedding network (convolutional encoder for images, linear layer for continuous states). 2) embeddings are processed by an autoregressive transformer model, trained to predict the next action given the previous tokens using a linear output layer.