Adversarial Shapley Value Experience Replay for Task-Free Continual Learning (University of Toronto)
An Empirical Study of Human Behavioral Agents in Bandits, Contextual Bandits and Reinforcement Learning (IBM, Columbia University, Mila, Universite de Montreal)
Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess (DeepMind, Vladimir Kramnik)
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets (UC Berkeley, this is actually from June but the write-up here was released this month)
COVID-19 Pandemic Cyclic Lockdown Optimization Using Reinforcement Learning (Oracle)
Decoupling Representation Learning from Reinforcement Learning (with Pieter Abbeel)
GLIB: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal Babbling (MIT)
Grounded Language Learning Fast and Slow (DeepMind, follow-up to SHIFTT)
Keypoints into the Future: Self-Supervised Correspondence in Model-Based Reinforcement Learning (Russ Tedrake)
Learning to summarize from human feedback (OpenAI)
Meta-Learning with Sparse Experience Replay for Lifelong Language Learning (Facebook, King's College London, University of Amsterdam)
Physically Embedded Planning Problems: New Challenges for Reinforcement Learning (DeepMind)