The debate recipe
Going back and forth between agents
If you want to challenge yourself, pause and see if you can use the pieces we’ve seen so far to write a recipe that has agents take turns at a debate about a question.
Once you’re ready, or if you just want to see the result, take a look at this recipe:
Once you’ve saved the recipe you can run it as usual:
You should see a debate like this:
The trace looks like this:
In agents = [recipe.agent(), recipe.agent()]
we’re creating two agents. This doesn’t actually matter since all the agents we’re using in ICE right now don’t have implicit state (except for humans), so we could just have created agents on the fly in the turn
function.
Exercises
Add a judge agent at the end that decides which agent won the debate. In the original debate proposal, these judgments would be used to RL-finetune the parameters of the debate agents.
Generate model judgments directly (only given the question) and after debate. Are there systematic differences between these judgments? You could also use models to generate the questions if you need a larger input set.
References
Irving, Geoffrey, Paul Christiano, and Dario Amodei. AI Safety via Debate. May 2, 2018.
Last updated