• Adversarial Shapley Value Experience Replay for Task-Free Continual Learning (University of Toronto)
  • An Empirical Study of Human Behavioral Agents in Bandits, Contextual Bandits and Reinforcement Learning (IBM, Columbia University, Mila, Universite de Montreal)
  • Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess (DeepMind, Vladimir Kramnik)
  • AWAC: Accelerating Online Reinforcement Learning with Offline Datasets (UC Berkeley, this is actually from June but the write-up here was released this month)
  • COVID-19 Pandemic Cyclic Lockdown Optimization Using Reinforcement Learning (Oracle)
  • Decoupling Representation Learning from Reinforcement Learning (with Pieter Abbeel)
  • GLIB: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal Babbling (MIT)
  • Grounded Language Learning Fast and Slow (DeepMind, follow-up to SHIFTT)
  • Keypoints into the Future: Self-Supervised Correspondence in Model-Based Reinforcement Learning (Russ Tedrake)
  • Learning to summarize from human feedback (OpenAI)
  • Meta-Learning with Sparse Experience Replay for Lifelong Language Learning (Facebook, King's College London, University of Amsterdam)
  • Physically Embedded Planning Problems: New Challenges for Reinforcement Learning (DeepMind)