News

Scott’s reflections on the NeurIPS 2022 conference

Scott’s reflections on the NeurIPS 2022 conference

Scott reflects on the best papers, workshops, and talks from the Neural Information Processing Systems (NeurIPS) conference 2022.

I was fortunate to attend NeurIPS in New Orleans in November. Here, I publish my takeaways to give you a feel for the zeitgeist. I’ll discuss, firstly, the papers, then the workshops, and finally, and briefly, the keynotes.

The best papers at NeurIPS 2022

Here’s a ranked list of my top 4 papers. Most are on Offline RL, which is representative of the conference writ large. (You can head to my website for an extended list of my top 8 papers from the conference).

1. Does Zero-Shot Reinforcement Learning Exist (Touati et. al, 2022)

Poster summarising talk from Touati et al, 2022 at NeurIPS: Does zero-shot Reinforcement Learning Exist?

Key idea. To do zero-shot RL, we need to learn a general function from reward-free transitions that implicitly encodes the trajectories of all optimal policies for all tasks. The authors propose to learn two functions: ��(�) and ��(�) that encode the future and past of state �. We want to learn functions that always find a route from �→�′.

Implication(s):

  • They beat every previous zero-shot RL algorithms on the standard offline RL tasks, and approach the performance of online, reward-guided RL algorithms in some envs.

Misc thoughts:

  • It seems clear that zero-shot RL is the route to real world deployment for RL. This work represents the best effort I’ve seen in this direction. I’m really excited by it and will be looking to extend it in my own future work.

2. Large Scale Retrieval for Reinforcement Learning (Humphreys et. al, 2022)

Diagram from Humphreys et al. 2022 paper at NeurIPS 2022: Large Scale Retrieval for Reinforcement Learning

Key idea. Assuming access to a large offline dataset, we perform a nearest neighbours search over the dataset w.r.t. the current state, and append the retrieved states, next actions, rewards and final states (in the case of go) to the current state. The policy then acts w.r.t this augmented state.

Implication(s):

  • Halves compute required to achieve the baseline win-rate in Go.

Misc thoughts:

  • This represents the most novel approach to offline RL I’ve seen; most techniques separate the offline and online learning phases, but here the authors combine them elegantly.
  • To me this feels like a far more promising approach to offline RL than CQL etc.

3. The Phenomenon of Policy Churn (Schaul et. al, 2022)

Key observation from Schaul et al (2022) paper at NeurIPS conference: The phenomenon of policy churn

Key idea. When a value-based agent acts greedily, the policy updates by a surprising amount per gradient step e.g. in up to 10% of states in some cases.

Implication(s):

  • Policy churn means that ((\epsilon))-greedy exploration may not be required as a rapidly changing policy induces enough noise into the data distribution that exploration may be implicit.

Misc thoughts:

  • Their paper is structured in a really engaging way.
  • I liked their ML researcher survey which quantified how surprising their result was to experts.

4. MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge (Fan et. al, 2022)

Image from Fan et al's paper presented at NeurIPS 2022:  MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Key idea. An internet-scale benchmark for generalist RL agents. 1000s of tasks, and a limitless procedurally-generated world for training.

Implication(s):

  • Provides a sufficiently diverse and complex sandbox for training more generally capable agents.

Misc thoughts:

  • This is an amazing feat software development effort from a relatively small team. Jim Fan is so cool!

Workshops at NeurIPS 2022

I attended 5 workshops:

  1. Foundation Models for Decision Making
  2. Safety
  3. Offline RL
  4. Real Life Reinforcement Learning
  5. Tackling Climate Change with Machine Learning

I found the latter three to be interesting, but less informative and precient as the first two. I therefore only discuss the Foundation Models for Decision Making and Safety workshops; the extent to which I enjoyed both workshops is, in a sense, oxymoronic.

Foundation Models for Decision Making

Leslie P. Kaelbling: What does an intelligent robot need to know?

My favourite talk was from Leslie Kaelbling of MIT. Kaelbling focussed on our proclivity for building inductive biases into our models (a similar thesis to Sutton’s Bitter Lesson); though good in short term, the effectiveness of such priors plateaus in the long-run. I agree with her.

She advocates for a marketplace of pre-trained models of the following types:

  • Foundation: space, geometry, kinematics
  • Psychology: other agents, beliefs, desires etc.
  • Culture: how do u do things in the world e.g. stuff you can read in books

Robotics manufacturers will provide:

  • observation / perception
  • actuators
  • controllers e.g. policies

And we’ll use our own expertise to build local states (specific facts about the env) and encode long horizon memories e.g. what did I do 2 years ago.


Safety (unofficial; in the Marriott across the road)

The safety workshop was wild. It was a small, unofficial congregation of researchers who you’d expect to see lurking on Less Wrong and other EA forums.

Christoph Schuhmann (Founder of LAION)

Chris is a high school teacher from Vienna; he gave an inspiring talk on the open-sourcing of foundation models. He started LAION (Large-scale Artificial Intelligence Open Network) a non-profit organization, provides datasets, tools and models to democratise ML research. His key points included:

  • Centralised intelligence means centralised problem solving; we can’t give the keys to problem solving to a (potentially) dictatorial few.
  • Risks by not open sourcing AI are bigger than those of open sourcing
  • LAION progress:
    • initial plan was to replicate the orignal CLIP / Dalle-1
    • got 3m image text pairs on his own
    • discord server helped him get 300m image text pairs, then 5b pairs
    • hedge fund gave them 8 A100s
  • We will always want to do things even if AI can, because we need to express ourselves

Thomas Wolf (Hugging Face CEO)

Tom Wolf gave a talk on the Big Science initiative, a project that takes inspiration from scientific creation schemes such as CERN and the LHC, in which open scientific collaborations facilitate the creation of large-scale artefacts that are useful for the entire research community:

  • 1000+ researchers coming together to build massive language model and massive dataset
  • efficient agi will probs require modularity cc. LeCun
  • working on the energy efficiency of training is inherently democratic i.e. stops models being held by the rich, especially re: inference

Are AI researchers aligned on AGI alignment?

There was interesting round table at the end of the workshop that included Jared Kaplan (Anthropic) and David Krueger (Cambridge) discussing what is means to align AGI. There was little agreement.


NeurIPS 2022 Keynotes

I attended 4 of the 6 keynotes which were:

  1. David Chalmers: Are Large Language Models Sentient?
  2. Emmanuel Candes: Conformal Prediction in 2022
  3. Isabelle Guyon: The Data-Centric Era: How ML is Becoming an Experimental Science
  4. Geoff Hinton: The Forward-Forward Algorithm for Training Deep Neural Networks

I found Emmanuel’s talk on conformal prediction enlightening as I’d never heard of the topic (here’s a primer), and Isabelle’s talk on benchmark and data transparency to be agreeable, if a little unoriginal. Hinton’s talk on a more anatomically correct learning algorithm was interesting, but I’m as yet unconvinced that mimicking human intelligence is a good way of building systems that are superior to humans—we are able to leverage hardware for artificial systems far superior to that accessible to humans. Chalmers talk was extremely thought-provoking; he structured the problem of consciousness in LLMs excellently—far better than I’ve seen to date, and as such was my favourite of the four.

References

Fan, L.; Wang, G.; Jiang, Y.; Mandlekar, A.; Yang, Y.; Zhu, H.; Tang, A.; Huang, D.-A.; Zhu, Y.; and Anandkumar, A. 2022. MINEDOJO: Building open-ended embodied agents with internet-scale knowledge. Advances in neural information processing systems, 35.

Humphreys, P. C.; Guez, A.; Tieleman, O.; Sifre, L.; Weber, T.; and Lillicrap, T. 2022. Large-Scale Retrieval for Reinforcement Learning. Advances in neural information processing systems, 35.

Schaul, T.; Barreto, A.; Quan, J.; and Ostrovski, G. 2022. The phenomenon of policy churn. Advances in neural information processing systems, 35.

Touati, A.; Rapin, J.; and Ollivier, Y. 2022. Does Zero-Shot Reinforcement Learning Exist?