Traffic signal control (TSC) is still one of the most significant and challenging research problems in the transportation field. Reinforcement learning (RL) has achieved great success in TSC but suffers from critically high learning costs in practical applications due to the excessive trial-and-error learning process. Offline RL is a promising method to reduce learning costs whereas the data distribution shift issue is still up in the air. To this end, in this paper, we formulate TSC as a sequence modeling problem with a sequence of Markov decision process described by states, actions, and rewards from the traffic environment. A novel framework, namely TransformerLight, is introduced, which does not aim to fit into value functions by averaging all possible returns, but produces the best possible actions using a gated Transformer. Additionally, the learning process of TransformerLight is much more stable by replacing the residual connections with gated transformer blocks due to a dynami
A deep dive into Transformer a neural network architecture that was introduced in the famous paper “attention is all you need” in 2017, its applications, impacts, challenges and future directions