ARENA

Alignment Research Engineer Accelerator

Our mission is to provide talented individuals with the skills, tools, and environment necessary for upskilling in ML engineering, for the purpose of contributing directly to AI alignment in technical roles.

We are excited to be running the fourth iteration from September 2nd - October 4th (the first week is optional). Applications are now closed.

The curriculum is organised into four core chapters, each one of which will run for a week. During this, participants will work through ML programming exercises in pairs under TA guidance to develop their skills. There is also a fifth chapter, where participants will be able to choose to dive deeper into one of several different topics covered during the course and complete a Capstone project.

The curriculum draws heavily from Redwood Research’s Machine Learning for Alignment Bootcamp. It also has overlap with other material (most notably Neel Nanda’s excellent open source material on mechanistic interpretability of transformers). We will also be including Q&As and group discussions on some AI safety specific writing in each chapter, which will often tie in directly to the curriculum content.

Chapter 0 - Fundamentals

Before getting into more advanced topics, we first cover the basics of deep learning, including basic machine learning terminology, what neural networks are, and how to train them.

We will also cover some subjects we expect to be useful going forwards, e.g. using GPT-3 and 4 to streamline your learning, good coding practices, and version control.

Chapter 1 - Transformers & Mechanistic Interpretability

The transformer is an important neural network architecture used for language modeling, and it has made headlines this year with the introduction of models like ChatGPT.

In this chapter, you will learn all about transformers, and build and train your own. You’ll also learn about Mechanistic Interpretability of transformers, a field which has been advanced by Anthropic’s Transformer Circuits Thread, and work by Neel Nanda.

Chapter 2 -
Reinforcement Learning

Reinforcement learning is an important field of machine learning. It works by teaching agents to take actions in an environment to maximise their accumulated reward.

In this chapter, you will be learning about some of the fundamentals of RL, and working with OpenAI’s Gym environment to run your own experiments.

You’ll also learn about Reinforcement Learning from Human Feedback, and apply it to the transformers you built earlier.

Chapter 3 - Model Evaluation

In this chapter, you will learn how to evaluate models. We'll take you through the process of building a multiple choice benchmark of your own and using this to evaluate current models. We'll then move on to study LM agents: how to build them and how to evaluate them.

Topics include:

Chapter 4 -
Capstone Projects

We will conclude this programme with capstone projects, where you get to dig into something related to the course. This should draw on much of the skills and knowledge you will have accumulated over the last 3 weeks, and serves as great way to round off the programme!