ARENA

Alignment Research Engineer Accelerator

We aim to provide talented individuals with an environment for upskilling in ML engineering, for the purpose of contributing directly to AI alignment in technical roles.

The first trial run of the program was held in London, in Winter 2022. The second iteration ran from 22nd May - 30th June 2023. The third is runnning from 8th January - 2nd February 2024 (applications closed).

The curriculum draws heavily from Redwood Research’s Machine Learning for Alignment Bootcamp. It also has overlap with other material (most notably Neel Nanda’s excellent open source material on mechanistic interpretability of transformers). We will also be including Q&As and group discussions on some AI safety specific writing in each chapter, which will often tie in directly to the curriculum content.

The curriculum is organised into four core chapters, each one of which will run for at least a week. There is also a fifth chapter, where participants will be able to choose to dive deeper into one of several different topics covered during the course. See here for a visual summary of the course.

Week 1 - Fundamentals

Before getting into more advanced topics, we first cover the basics of deep learning, including basic machine learning terminology, what neural networks are, and how to train them.

We will also cover some subjects we expect to be useful going forwards, e.g. using GPT-3 and 4 to streamline your learning, good coding practices, and version control.

Week 2 - Transformers & Mechanistic Interpretability

The transformer is an important neural network architecture used for language modeling, and it has made headlines this year with the introduction of models like ChatGPT.

In this chapter, you will learn all about transformers, and build and train your own. You’ll also learn about Mechanistic Interpretability of transformers, a field which has been advanced by Anthropic’s Transformer Circuits Thread, and work by Neel Nanda.

Week 3 -
Reinforcement Learning

Reinforcement learning is an important field of machine learning. It works by teaching agents to take actions in an environment to maximise their accumulated reward.

In this chapter, you will be learning about some of the fundamentals of RL, and working with OpenAI’s Gym environment to run your own experiments.

You’ll also learn about Reinforcement Learning from Human Feedback, and apply it to the transformers you built earlier.

Week 4 -
Capstone Projects

We will conclude this program with capstone projects, where you get to dig into something related to the course. This should draw on much of the skills and knowledge you will have accumulated over the last 3 weeks, and serves as great way to round off the program!