ARENA
Alignment Research Engineer Accelerator
ARENA’s mission is to provide talented individuals with the ML engineering skills, community, and confidence to contribute directly to technical AI safety.
The ARENA curriculum is organised into four core chapters, each one of which will run for a week. During this, participants will work through ML programming exercises in pairs under TA guidance to develop their skills. There is also a fifth chapter, where participants will be able to choose to dive deeper into one of several different topics covered during the course and complete a Capstone project.
The curriculum draws heavily from Redwood Research’s Machine Learning for Alignment Bootcamp. It also has overlap with other material (most notably Neel Nanda’s excellent open source material on mechanistic interpretability of transformers). We will also be including Q&As and group discussions on some AI safety specific writing in each chapter, which will often tie in directly to the curriculum content.
Chapter 0 - Fundamentals
Before getting into more advanced topics, we first cover the basics of deep learning, including basic machine learning terminology, what neural networks are, and how to train them.
We will also cover some subjects we expect to be useful going forwards, e.g. using GPT-3 and 4 to streamline your learning, good coding practices, and version control.
Chapter 1 - Transformers & Mechanistic Interpretability
The transformer is an important neural network architecture used for language modeling, and it has made headlines this year with the introduction of models like ChatGPT.
In this chapter, you will learn all about transformers, and build and train your own. You’ll also learn about Mechanistic Interpretability of transformers, a field which has been advanced by Anthropic’s Transformer Circuits Thread, and work by Neel Nanda.
Chapter 2 -
Reinforcement Learning
Reinforcement learning is an important field of machine learning. It works by teaching agents to take actions in an environment to maximise their accumulated reward.
In this chapter, you will be learning about some of the fundamentals of RL, and working with OpenAI’s Gym environment to run your own experiments.
You’ll also learn about Reinforcement Learning from Human Feedback, and apply it to the transformers you built earlier.
Chapter 3 - Model Evaluation
In this chapter, you will learn how to evaluate models. We'll take you through the process of building a multiple choice benchmark of your own and using this to evaluate current models. We'll then move on to study LM agents: how to build them and how to evaluate them.
Topics include:
Constructing benchmarks for models
Building pipelines to automate model evaluation
Building and evaluating LM agents
Chapter 4 -
Capstone Projects
We will conclude this programme with capstone projects, where you get to dig into something related to the course. This should draw on much of the skills and knowledge you will have accumulated over the last 3 weeks, and serves as great way to round off the programme!